Publishing Your Package In A Community Repository: PyPI or Anaconda.org#
pyOpenSci requires that your package has an distribution that can be installed
from a public community repository such as PyPI or a conda channel such as
bioconda
or conda-forge
on Anaconda.org.
Below you will learn more about the various publishing options for your Python package.
Take Aways
Installing packages in the same environment using both pip and conda can lead to package conflicts.
To minimize conflicts for users who may be using conda (or pip) to manage local environments, consider publishing your package to both PyPI and the conda-forge channel on Anaconda.org.
Below you will learn more specifics about the differences between PyPI and conda publishing of your Python package.
What is PyPI#
PyPI is an online Python package repository that you can use to both find and install and publish your Python package. There is also a test PyPI repository where you can test publishing your package prior to the final publication on PyPI.
Many if not most Python packages can be found on PyPI and are thus installable using pip
.
The biggest different between using pip and conda to install
a package is that conda can install any package regardless
of the language(s) that it is written in. Whereas pip
can
only install Python packages.
Click here for a tutorial on publishing your package to PyPI.
Tip
On the package build page, we discussed the two package distribution types that you will create when making a Python package: SDist (packaged as a .tar.gz or .zip) and Wheel (.whl) which is really a zip file. Both of those file “bundles” will be published on PyPI when you use a standard build tool to build your package.
What is conda and Anaconda.org?#
conda is an open source package and environment management tool. conda can be used to install tools from the Anaconda repository.
Anaconda.org contains public and private repositories for packages. These repositories are known as channels (discussed below).
A brief history of conda’s evolution
The conda ecosystem evolved years ago to provide support for, and simplify the process of, managing software dependencies in scientific Python projects.
Many of the core scientific Python projects depend upon or wrap around tools and extensions that are written in other languages, such as C++. In the early stages of the scientific ecosystem’s development, these non-Python extensions and tools were not well supported on PyPI, making publication difficult. In recent years there is more support for complex builds that allow developers to bundle non-Python code into a Python distribution using the wheel distribution format.
Conda provides a mechanism to manage these dependencies and ensure that the required packages are installed correctly.
Tip
While conda was originally created to support Python packages, it is now used across all languages. This cross-language support makes it easier for some packages to include and have access to tools written in other languages, such as C/C++ (gdal), Julia, or R. Creating an environment that mixes all of these packages is usually easier and more consistent with full-fledged package managers like conda.
conda channels#
conda built packages are housed within repositories that are called channels. The conda package manager can install packages from different channels.
There are several core public channels that most people use to install packages using conda, including:
defaults: this is a channel managed by Anaconda. It is the version of the Python packages that you will install if you install the Anaconda Distribution. Anaconda (the company) decides what packages live on the
defaults
channel.conda-forge: this is a community-driven channel that focuses on scientific packages. This channel is ideal for tools that support geospatial data. Anyone can publish a package to this channel.
bioconda: this channel focuses on biomedical tools.
conda-forge emerged as many of the scientific packages did not
exist in the defaults
Anaconda channel.
conda channels, PyPI, conda, pip - Where to publish your package#
You might be wondering why there are different package repositories that can be used to install Python packages.
And more importantly you are likely wondering how to pick the right repository to publish your Python package.
The answer to both questions relates dependency conflicts.
Managing Python package dependency conflicts#
Python environments can encounter conflicts because Python tools can be installed from different repositories. Broadly speaking, Python environments have a smaller chance of dependency conflicts when the tools are installed from the same package repository. Thus environments that contain packages installed from both pip and conda are more likely to yield dependency conflicts.
Similarly installing packages from the default anaconda channel mixed with the conda-forge channel can also lead to dependency conflicts.
Many install packages directly from conda defaults
channel. However, because
this channel is managed by Anaconda, the packages available on it are
limited to those that Anaconda decides should be core to a stable installation. The conda-forge channel was created to complement the defaults
channel. It allows anyone to submit a package to be published in the channel . Thus, conda-forge
channel ensures that a broad suite of user-developed community packages can be installed from conda.
Take-aways: If you can, publish on both PyPI and conda-forge to accommodate more users of your package#
The take-away here for maintainers is that if you anticipate users wanting to use conda to manage their local environments (which many do), you should consider publishing to both PyPI and the conda-forge channel (more on that below).
How to submit to conda-forge#
While pyOpenSci doesn’t require you to add your package to conda-forge, we encourage you to consider doing so!
Once your package is on PyPI, the process to add your package to conda-forge is straight forward to do. You can follow the detailed steps provided by the conda-forge maintainer team..
Click here for a tutorial on adding your package to conda-forge.
If you want a step by step tutorial, click here.
Once your package is added, you will have a feedstock repository on GitHub with your packages name
Maintaining your conda-forge package repository#
Once your package is on the conda-forge channel, maintaining it is simple. Every time that you push a new version of your package to PyPI, it will kick off a continuous integration build that updates your package in the conda-forge repository. Once that build is complete, you will get a notification to review the update.
You can merge the pull request for that update once you are happy with it. A ready-to-merge PR usually means ensuring that your project’s dependencies (known as runtime requirements) listed in the updated YAML file found in the pull request match the PyPI metadata of the new release.