I want Advice on Managing Python Environments in Anaconda

Hey everyone,

I have been using Anaconda for a while now but I still find myself struggling a bit when it comes to managing environments efficiently. I end up creating multiple environments for different projects & before I know it, my system is cluttered with tons of unused ones.

I want to know how do you all keep your Anaconda environments organized?? Do you have a set strategy for naming them or do you regularly clean up old environments? Also, is there a best practice for sharing an environment with others, such as when collaborating on a project? I usually use conda env export, but sometimes, dependencies don’t sync up perfectly on another machine.

For example, when working on Machine Learning projects; I often need different versions of libraries & things can get messy quickly. Any tips, tricks or tools you use to make environment management smoother?

Thank you… :smiley:

I’m writing my ML portfolio currently and have spent the last 2 days trawling through examples and taking notes on best practices and all th different code. I’m used to using Venv’s but it was killing my HD space.
If you have Notion i can send you my pages of notes (they’re well organised as I am using them to go alongside my Python helper package. I’m still in the middle of making them look pretty and finishing off but if you don’t have notion, the general gist is:

  1. Helper packages with Environments: I’m currently working on a package of 40 odd ML functions to use as helpers so i dont have to type out all the test train stuff 1000’s of times. The main hidden benefit was that it helped me learn things better as well, so i know when to onehotencode, tSNE or whatever.
    The main thing is that it comes with a conda-forged library set ready to go, so i start most ML projects by installing that onto:
  2. Start off with a clean environment every time, don’t load anything onto base unless its crtitcal:
conda create -n clean_env python pip numpy pandas scikit-learn scipy pyarrow umap-learn faker

conda activate clean_env
  1. Use a meta.YAML for each areas of the ML work you’re doing, but more importantly export a clean environment.yml conda env export --no-build > environment.yml after you have your build. Add the --from-history flag if you only want to capture libraries you explicitly installed (not auto-installed dependencies) conda env export --from-history > environment.yml . I do this with all of mine to keep them clean. It should look something like this once its generated:
name: clean_env
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.10
  - pip
  - numpy
  - pandas
  - scikit-learn
  - scipy
  - pyarrow
  - umap-learn
  - faker

Once you have the environment.yml, you have your build, so can go ahead and delete the conda env when you’re done.
3. if you have a giant base Conda environment where everything was installed, you can rebuild it by recreating it:
First do a backup: `conda list --explicit > base_environment_backup.txt
Next you can comletely reset the base environment (it has 641 packages acurrently):

conda deactivate  # Make sure no environment is active
   conda list --explicit > base_env_backup.txt  # Backup the existing base packages
   conda clean --all --yes  # Clear unused packages and caches

Then install:
conda install conda

If you’re still using any dependencies installed via pip, make sure they are essential to your project. For packages not available in Conda Forge (e.g., umap-learn might pull subpackages from pip), keep a section in your environment.yml specifically for pip-managed dependencies:

dependencies:
  - python=3.10
  - pip
  - numpy
  - pandas
  - scikit-learn
  - scipy
  - pyarrow
  - faker
  - pip:
      - umap-learn

` So as long as you’re using forge you can create your custom env’s, then store them just as a .yml for future use should you come back to that topic.

@davidbarnesguildford’s approach is also what I would recommend. To add to what David Barnes has said, using environment.yml gives the following benefits.

  1. You can version control an environment.yml file.
    – The best way manage a software project is through version control. This includes what software the project requires as well. This enables rolling back unwanted changes or the ability to debug the software over time if an update breaks things.
  2. Environment files allow you to specify the packages your project requires and their version ranges.
    – The issue with using conda env export is that it computes the packages within an environment and at too fine grained a level. The issue with listing all of the packages in an environment, is it does not give you the packages that your project directly relies on, but also the packages that those packages rely on. It also lists either exact versions and even the build numbers for conda packages. This creates a very brittle definition of an environment because the build number will often not cross operating system and sometimes exact version numbers will not either. It also makes the project maintainer responsible for keeping the dependencies of your project’s dependences at the correct versions since it lists those explicitly as well. So when a dependency of your project updates one of it’s dependencies, you would have to go and make that update as well manually even though your project does not use that dependency directly.
  3. An environment.yml file is far more reproducible and shareable.
    – Because an environment.yml file only lists the dependency your project uses, and it can define conda spec ranges for those dependencies, the chance of a collaborator being able to reproduce the environment on their machine is far more likely. And if they cannot, it is often a package or two that need a version change, or a range expanded to enable installing. As David Barnes said, this also allows you to delete large AI project environments on your machine, and the ability to reproduce them when needed.

Here is documentation on creating environments from a yml file.

Here is documentation for exporting environments across platforms.