Part 2/3: Blog series on package health

This blog is the second in a 3 part series. In the previous blog post, I discussed why the health of (Python) open source packages should matter to you as a scientist (and as a person who values and uses free and open source tools in your workflow). In this post, I’ll talk more about why collecting metrics are critical to both program development success and to the success of open source tools. I’ll wrap up this series with a discussion on what types of package metrics pyOpenSci should be collecting around the free and open source Python packages that you use..

NOTE: all of this is in the context of a conversation on Twitter. It is not a comprehensive perspective on the final metrics that pyOpenSci plans to collect.

Metrics are critical to the development of any program

I’ve created a few open science focused programs now from the ground up. One at NEON and another at CU Boulder. When building a new program, one of the first things that I do (after defining the mission and goals) is to define the metrics that constitute success.

These metrics are critical to define early because:

  • They drive everything that you do.
  • And often they take time to develop
  • It’s critical to have solid baseline data. This baseline data needs to be collected from the start of your program as often, it can’t be collected later, retrospectively.

If you have evaluation or education in your professional background like I do, you may even create a logic model to map activities to outcomes and goals. This logic model helps you define how to collect the data that you need to track outcome success.

Baseline data are critical to collect at the start of building a program or organization

As I am building the pyOpenSci program, I find myself thinking about what metrics around Python scientific open source software we want to track to better understand:

  1. The outcomes of our activities
  2. The overall health of packages in our growing pyOpenSci ecosystem (specific to our organization)
  3. How/if we contributed to improving the health of packages in our ecosystem
  4. How we are impacting the broader scientific python, open source community

Baseline metrics collected from the start will help your future self when running or working in an organization

As mentioned above, collecting metrics from the start of your efforts allows you to get off the ground running with data that you can use to compare to future data. Thus while it may not be the work that you want to do, it will help your future self.

For pyOpenSci, collecting metrics allows us in the future to evaluate our programs and adaptively change things to make sure we are getting the outcomes that we want.

Outcomes such as

  • Scientists being better able to find packages that are maintained to support their workflows
  • Improved documentation in packages as a result of our reviews

But what metrics should we collect about scientific Python packages?

In a previous post, I spoke generally about why open source should matter to you as a scientist and as a developer or package maintainer.

To better understand what data we should be collecting to track our packages’ health over time, I went to Twitter to see what my colleagues around the world had to say. That conversation resulted in some really interesting insights.

In my next blog post, I will summarize the discussion that happened on twitter.

Most importantly, it allowed me to begin to break down and group metrics in terms of pyOpenSci goals.

Goals for package metrics

We hope that:

  • Peer review improves Python package structure and usability.
  • Peer review in some way supports maintenance and/or responsible archiving when a package comes to life-end.
  • Over time, the package is improved and maintained.
  • Over time we hope to facilitate outside contributions to these packages.

We need metrics to understand things like

  • Community adoption of the package (are scientists using it?)
  • Maintenance level of the package (are maintainers still working on it and fixing bugs?)
  • Infrastructure (are tests set up to help identify if contributions break things? )
  • Usability (is the package documented in a way that helps users quickly get started)

Four metric categories that pyOpenSci cares about

Based on all of the feedback on twitter, which is summarized in the next post, and what I think might be a start at what pyOpenSci needs to consider, I organized the conversation into four broad categories:

  1. Infrastructure
  2. Maintenance
  3. Community adoption (and usability??)
  4. Diversity, Equity, Inclusion, and Accessibility (DEIA) - discussed less on twitter but integral to pyOpenSci goals.

These four categories, at not by any means mutually exclusive. They are merely a way to begin to organize an engaging and diverse conversation. All of the categories are priorities of pyOpenSci.

Click here to read more about how these metric categories evolved within the conversation on Twitter!


Leave any feedback that you have in the comments section below.

Categories: blog-post , highlight , peer-review , python-packaging


Leave a comment