Skip to content

Anaconda

Python is a high level programming language that is widely used in many branches of science. As a result, many scientific packages have been developed in Python, leading to the development of a package manager called Anaconda. Anaconda is the standard in Python package management for scientific research.

Benefits of Anaconda:

  • Shareability: environments can be shared via human-readable text-based YAML files.
  • Maintainability: the same YAML files can be version controlled using git.
  • Repeatability: environments can be rebuilt using those same YAML files.
  • Simplicity: dependency matrices are computed and solved by Anaconda, and libraries are pre-built and stored on remote servers for download instead of being built on your local machine.
  • Ubiquity: nearly all Python developers are aware of the usage of Anaconda, especially in scientific research, so there are many resources available for learning how to use it, and what to do if something goes wrong.

Anaconda can also install Pip and record which Pip packages are installed, so Anaconda can do everything Pip can, and more.

What is my best solution for installing Anaconda?

If you are using a local machine or doing general purpose software development, or have a particular package in mind, go here to install Anaconda.

If you are using a virtual machine or container, go here to install Miniconda.

If you are using Cheaha, go here for how to use Anaconda on Cheaha.

Installing Anaconda

The full Anaconda install is a good choice if you are using a local machine, or doing general Python development work, or have a particular scientific package in mind.

Anaconda installation instructions are located here: https://docs.anaconda.com/anaconda/install/index.html.

Installing Miniconda

Miniconda is a lightweight version of Anaconda. While Anaconda's base environment comes with Python, the Scipy stack, and other common packages pre-installed, Miniconda comes with no packages installed. This is an excellent alternative to the full Anaconda installation for environments where minimal space is available or where setup time is important, like virtual machines and containers.

Miniconda installation instructions are located here: https://docs.conda.io/en/latest/miniconda.html.

Using Anaconda

Anaconda is a package manager, meaning it handles all of the difficult mathematics and logistics of figuring out exactly what versions of which packages should be downloaded to meet your needs, or inform you if there is a conflict.

Anaconda is structured around environments. Environments are self-contained collections of researcher-selected packages. Environments can be changed out using a simple package without requiring tedious installing and uninstalling of packages or software, and avoiding dependency conflicts with each other. Environments allow researchers to work and collaborate on multiple projects, each with different requirements, all on the same computer. Environments can be installed from the command line, from pre-designed or shared YAML files, and can be modified or updated as needed.

The following subsections detail some of the more common commands and use cases for Anaconda usage. More complete information on this process can be found at the Anaconda documentation.

Create an Environment

In order to create a basic environment with the default packages, use the conda create command:

# create a base environment. Replace <env> with an environment name
conda create -n <env>

If you are trying to replicate a pipeline or analysis from another person, you can also recreate an environment using a YAML file, if they have provided one. To replicate an environment using a YAML file, use:

# replicate an environment from a YAML file named env.yml
conda create -n <env> -f <path/to/env.yml>

By default, all of your conda environments are stored in /home/<user>/.conda/envs.

Activate an Environment

From here, you can activate the environment using either source or conda:

# activate the virtual environment using source
source activate <env>

# or using conda
conda activate <env>

To know your environment has loaded, the command line should look like:

(<env>) [blazerid@c0XXX ~]$

Once the environment is activated, you are allowed to install whichever python libraries you need for your analysis.

Install Packages

To install packages using Anaconda, use the conda install command. The -c or --channel command can be used to select a specific package channel to install from. The anaconda channel is a curated collection of high-quality packages, but the very latest versions may not be available on this channel. The conda-forge channel is more open, less carefully curated, and has more recent versions.

# install most recent version of a package
conda install <package>

# install a specific version
conda install <package>=version

# install from a specific conda channel
conda install -c <channel> <package><=version>

Generally, if a package needs to be downloaded from a specific conda channel, it will mention that in its installation instructions.

Installing Packages with Pip

Some packages are not available through Anaconda. Often these packages are available via PyPi and thus using the Python built-in Pip package manager. Pip may also be used to install locally-available packages as well.

# install most recent version of a package
pip install \<package\>

# install a specific version, note the double equals sign
pip install \<package\>==version

# install a list of packages from a text file
pip install -r packages.txt

Finding Packages

You may use the Anaconda page to search for packages on Anaconda, or use Google with something like <package name> conda. To find packages in PyPi, either use the PyPi page to search, or use Google with something like <package name> pip.

Packages for Jupyter

If you are using Anaconda with Jupyter, you will need to be sure to install the ipykernel package for your environment to be recognized by the Jupyter Server. If you are using Jupyter in Open OnDemand then you do not need to install the jupyter package.

Deactivating an Environment

An environment can be deactivated using the following command.

# Using conda
conda deactivate

Anaconda may say that using source deactivate is deprecated, but environment will still be deactivated.

Closing the terminal will also close out the environment.

Working with Environment YAML Files

Exporting an Environment

To easily share environments with other researchers or replicate it on a new machine, it is useful to create an environment YAML file. You can do this using:

# activate the environment if it is not active already
conda activate <env>

# export the environment to a YAML file
conda env export > env.yml

Creating an Environment from a YAML File

To create an environment from a YAML file env.yml, use the following command.

conda env create --file env.yml

Replicability versus Portability

An environment with only python 3.10.4, numpy 1.21.5 and jinja2 2.11.2 installed will output something like the following file when conda env export is used. This file may be used to precisely replicate the environment as it exists on the machine where conda env export was run. Note that the versioning for each package contains two = signs. The code like he774522_0 after the second = sign contains hyper-specific build information for the compiled libraries for that package. Sharing this exact file with collaborators may result in frustration if they do not have the exact same operating system and hardware as you, and they would not be able to build this environment. We would say that this environment file is not very portable.

There are other portability issues:

  • The prefix: C:\... line is not used by conda in any way and is deprecated. It also shares system information about file locations which is potentially sensitive information.
  • The channels: group uses - defaults, which may vary depending on how you or your collaborator has customized their Anaconda installation. It may result in packages not being found, resulting in environment creation failure.
name: test-env
channels:
  - defaults
dependencies:
  - blas=1.0=mkl
  - bzip2=1.0.8=he774522_0
  - ca-certificates=2022.4.26=haa95532_0
  - certifi=2021.5.30=py310haa95532_0
  - intel-openmp=2021.4.0=haa95532_3556
  - jinja2=2.11.2=pyhd3eb1b0_0
  - libffi=3.4.2=h604cdb4_1
  - markupsafe=2.1.1=py310h2bbff1b_0
  - mkl=2021.4.0=haa95532_640
  - mkl-service=2.4.0=py310h2bbff1b_0
  - mkl_fft=1.3.1=py310ha0764ea_0
  - mkl_random=1.2.2=py310h4ed8f06_0
  - numpy=1.21.5=py310h6d2d95c_2
  - numpy-base=1.21.5=py310h206c741_2
  - openssl=1.1.1o=h2bbff1b_0
  - pip=21.2.4=py310haa95532_0
  - python=3.10.4=hbb2ffb3_0
  - setuptools=61.2.0=py310haa95532_0
  - six=1.16.0=pyhd3eb1b0_1
  - sqlite=3.38.3=h2bbff1b_0
  - tk=8.6.11=h2bbff1b_1
  - tzdata=2022a=hda174b7_0
  - vc=14.2=h21ff451_1
  - vs2015_runtime=14.27.29016=h5e58377_2
  - wheel=0.37.1=pyhd3eb1b0_0
  - wincertstore=0.2=py310haa95532_2
  - xz=5.2.5=h8cc25b3_1
  - zlib=1.2.12=h8cc25b3_2
prefix: C:\Users\user\Anaconda3\envs\test-env

To make this a more portable file, suitable for collaboration, some planning is required. Instead of using conda env export we can build our own file. Create a new file called env.yml using your favorite text editor and add the following. Note we've only listed exactly the packages we installed, and their version numbers, only. This allows Anaconda the flexibility to choose dependencies which do not conflict and do not contain unusable hyper-specific library build information.

name: test-env
channels:
  - anaconda
dependencies:
  - jinja2=2.11.2
  - numpy=1.21.5
  - python=3.10.4

This is a much more readable and portable file suitable for sharing with collaborators. We aren't quite finished though! Some scientific packages on the conda-forge channel, and on other channels, can contain dependency errors. Those packages may accidentally pull a version of a dependency that breaks their code.

For example, the package markupsafe made a not-backward-compatible change (a breaking change) to their code between 2.0.1 and 2.1.1. Dependent packages expected 2.1.1 to be backward compatible, so their packages allowed 2.1.1 as a substitute for 2.0.1. Since Anaconda chooses the most recent version allowable, package installs broke. To work around this for our environment, we would need to modify the environment to "pin" that package at a specific version, even though we didn't explicitly install it.

name: test-env
channels:
  - anaconda
dependencies:
  - jinja2=2.11.2
  - markupsafe=2.0.1
  - numpy=1.21.5
  - python=3.10.4

Now we can be sure that the correct versions of the software will be installed on our collaborator's machines.

Note

The example above is provided only for illustration purposes. The error has since been fixed, but the example above really happened and is helpful to explain version pinning.

Good Software Development Practice

Building on the example above, we can bring in good software development practices to ensure we don't lose track of how our environment is changing as we develop our software or our workflows. If you've ever lost a lot of hard work by accidentally deleting an important file, or forgetting what changes you've made that need to be rolled back, this section is for you.

Efficient software developers live the mantra "Don't repeat yourself". Part of not repeating yourself is keeping a detailed and meticulous record of changes made as your software grows over time. Git is a way to have the computer keep track of those changes digitally. Git can be used to save changes to environment files as they change over time. Remember that each time your environment changes to commit the output of Exporting your Environment to a repository for your project.

Speeding Things up with Mamba

Mamba is an alternative to Anaconda that uses libsolv and parallel processing to install environments more quickly, sometimes by an order of magnitude. Mamba will also discover conflicts very quickly. Mamba is available as a package via Anaconda. Currently Mamba cannot be installed on Cheaha, only on self-maanged systems like cloud.rc instances. To install use the following.

conda activate base
conda update --all
conda install -n base -c conda-forge mamba

Warning

Mamba must be installed in the base environment to function correctly! If you are using Cheaha, and cannot install in the base environment, see our workaround here


Last update: June 9, 2022