The Scope of Packages that pyOpenSci Reviews#

What types of packages does pyOpenSci review?#

pyOpenSci reviews higher level software packages that support scientific workflows.

Image showing the tiers of software in the python ecosystem starting with Python itself and as you move out packages become more domain specific. In this image packages like xarray and numpy are considered core to scientific python. Packages and distributions like astropy, simpeg and metpy are considered to be domain specific.

Diagram showing the tiers of software in the python ecosystem starting with Python itself and as you move out packages become more domain specific. In this image packages like xarray and numpy are considered core to scientific python. Packages and distributions like astropy, simpeg and metpy are considered to be domain specific. pyOpenSci’s review process focuses on domain specific packages rather than core packages as these packages tend to have more variability in long term maintenance and package infrastructure and quality compared to established core packages. Source: “Jupyter meets earth” project#

Currently, the packages that pyOpenSci reviews also need to fall into the technical and applied scope of our organization. This scope may expand over time as the organization grows.

Is Your Package in Scope For pyOpenSci Review?#

pyOpenSci only reviews packages that fall within our specified domain and technical scope listed below.

If you are unsure whether your package is in scope for review, please open a pre-submission inquiry using a GitHub Issue to get feedback from one of our editors. We are happy to look at your package and help you understand whether it is in scope or not.

Python package domain scope#

The following categories are the current domain areas that fall into the pyOpenSci domain scope.

  • Data retrieval: Packages for accessing and downloading data from online sources. Includes wrappers for accessing APIs.

  • Data extraction: Packages that aid in retrieving data from unstructured sources such as text, images and PDFs.

  • Data munging: Tools for processing data from scientific data formats.

  • Data deposition: Tools for depositing data in scientific research repositories.

  • Reproducibility: Tools to scientists ensure that their research is reproducible. E.g. version control, automated testing, or citation tools.

  • Geospatial: Packages focused on the retrieval, manipulation, and analysis of spatial data.

  • Education: Packages to aid with instruction.

  • Data visualization: Packages for visualizing and analyzing data.

Package technical scope#

To be in technical scope for a pyOpenSci review, your package:

  • Should have maintenance workflows documented.

  • Should be structured in a way that someone else could contribute to it.

  • Should declare vendor dependencies using standard approaches rather than including code from other packages within your repository.

Notes on scope categories#

  • pyOpenSci is still developing as a community. If your scientific Python package does not fit into one of the categories or if you have any other questions, we’d encourage you to open a pre-submission inquiry. We’re happy to help.

  • Data visualization packages come in many varieties, ranging from small hyper-specific methods for one type of data to general, do-it-all packages (e.g. matplotlib). pyOpenSci accepts packages that are somewhere in between the two. If you’re interested in submitting your data visualization package, please open a pre-submission inquiry first.

Python package technical scope#

pyOpenSci may continue to update its technical scope criteria for package review as more packages with varying structural approaches are reviewed. Your package may not be in technical scope for us to review at this time if fits any of the out-of-technical-scope criteria listed below.

Important

If the code base of your package is exceedingly complex in terms of structure of maintenance needs, we may not be able to review it.

pyOpenSci has a goal of supporting long term maintenance of open source Python tools. It is thus important for us to know that if you need to step down as a maintainer, the package infrastructure and documentation is in place to support us finding a new maintainer who can take over you package’s maintenance.

Examples of technically complex package structures that may be difficult for us to review

Example 1: Your package is an out of sync fork of another package repository that is being actively maintained.

Sometimes we understand that a package maintainer may need to step down. In that case, we strongly suggest that the original package owner, transfer the package repository to a new organization along with PyPI credentials. A new organization would allow transfer of ownership of package maintenance rather than several forks existing.

If your package is a divergent fork of a maintained repository we will encourage you to work with the original maintainers to merge efforts.

However, if there is a case where a forked repository is warranted, please consider submitting a pre-submission inquiry first and explain why the package is a fork rather than an independent parent repository.

Example 2: Vendored dependencies

If your package is a wrapper that wraps around another tool, we prefer that the dependency be added as a dependency to your package. This allows maintenance of the original code base to be independent from your package’s maintenance.

Package Overlap#

pyOpenSci encourages competition among packages, forking and re-implementation as they improve options of users. However, we strive to make packages in the pyOpenSci suite to represent our top recommendations for the tasks that they perform. We aim to avoid duplication of functionality of existing Python packages in any repo without significant improvements. A Python package that replicates the functionality of an existing package may be considered for inclusion in the pyOpenSci suite if it significantly improves on alternatives by being:

  • More open in licensing or development practices

  • Broader in functionality (e.g., providing access to more data sets, providing a greater suite of functions), but not only by duplicating additional packages

  • Better in usability and performance

  • Actively maintained while alternatives are poorly or no longer actively maintained

These factors should be considered as a whole to determine if the package is a significant improvement. A new package would not meet this standard only by following our package guidelines while others do not, unless this leads to a significant difference in the areas above.

We recommend that packages highlight differences from and improvements over overlapping packages in their README and/or vignettes or get started tutorials.

We encourage developers whose packages are not accepted due to overlap to still consider submittal to other repositories or journals.