Episode 236: Eva Maxfield Brown & Boris Veytsman on OSS Dependencies in the Sciences
vor 1 Jahr
Eva and Boris dive into their paper "Biomedical Open Source
Software", 'Nebraska' packages, and broader software
sustainability.
Podcast
Podcaster
Beschreibung
vor 1 Jahr
Guests Eva Maxfield Brown | Boris Veytsman Panelist Richard
Littauer Show Notes In this episode of Sustain, host Richard
Littauer engages with guests Eva Maxfield Brown and Boris Veytsman
to explore their co-authored paper, "Biomedical Open Source
Software: Crucial Packages and Hidden Heroes." The paper focuses on
identifying crucial but often overlooked software dependencies in
biomedical research. The discussions delve into how the study used
data from two million papers to map these dependencies, revealing
both well-supported and undermaintained software components vital
to scientific research. There’s a conversation on the
methodological challenges and the concept of "Nebraska packages,"
which are essential yet potentially undermaintained elements
crucial to the software stack used in both industry and science.
The conversation also covers broader implications for software
sustainability, security, and future research directions, including
improving how software contributions are tracked and recognized
within scientific careers. Press download now to hear more!
[00:01:47] Richard dives into the paper co-authored by Eva and
Boris. Boris explains the origins of the paper, starting from a
workshop at CZI aimed at accelerating science through sustainable
software, leading to the analysis of software used in biomedical
research. He highlights the focus on identifying crucial yet often
unmentioned software dependencies in research software, which he
labels as “unsung heroes.” [00:05:22] Boris provides findings from
their study, noting that while many foundational packages were
cited, there are significant packages that, despite their critical
role, remain uncited. [00:06:43] Eva discusses the concept of
“Nebraska packages,” which are essential yet potentially
undermaintained components that are crucial to the software stack
used in both industry and science. Also, she elaborates on the
methodological challenges of determining which packages to include
in their analysis, particularly in terms of dependencies that vary
between different users and contexts. [00:09:42] Richard reflects
on the broader implications of their discussion for the open source
community, particularly in terms of software sustainability and
security. Eva emphasizes the importance of security across all
fields and discusses the potential impact of software bugs on
scientific research and the need for robust software
infrastructure. [00:12:04] Boris comments on the necessity of
well-tested tools in the scientific community, given that many
scientists may lack a strong background in software development and
training. [00:13:47] Richard quotes from the paper discussing the
absence of cycles in the network of software packages used in
science, indicating a more robust design compared to general
software. He questions this in light of earlier comments about
scientists not being great at coding. [00:14:08] Eva explains that
the paper’s findings about acyclic dependencies (DAGs) might seem
surprising given the common perception that scientific software is
poorly developed. She notes that while scientists may not be
trained in proper software packaging, the Python environment helps
prevent cyclic dependencies. [00:17:31] Richard brings up “Katz
centrality” which is discussed in the paper, and Boris clarifies
that “Katz centrality” refers to a concept by Leo Katz on network
centrality, explaining how it helps determine the importance of
nodes within a network. [00:20:13] Richard questions the practical
applications of the research findings, probing for advice on
supporting crucial but underrecognized dependencies within software
ecosystems. Eva addresses future research directions, including
improving ecosystem matching algorithms for better accuracy in
linking software mentions to the correct ecosystems. [00:22:50] Eva
suggests expanding the research to cover more domains beyond
biomedicine, considering different software needs across various
scientific disciplines. Boris discusses the potential for targeted
interventions to support underrecognized contributors in the
scientific software community aiming to enhance their prestige.
[00:27:22] Richard asks how the research team plans to map
dependencies to individual contributors and track their
motivations. Boris responds that while they have gathered
substantial data from sources like GitHub logs, publishing this
information poses ethical challenges due to privacy concerns.
[00:28:45] Eva discusses her work on linking GitHub profiles to
academic authors using ORCID identifiers to better track
contributions to scientific software. [00:31:42] Richard brings up
the broader impacts of their research, questioning whether their
study on software packages centrality within the scientific
community is unique or if there are similar studies at this scale.
Eva acknowledges the need for more comprehensive studies and cites
a previous study from 2015 that analyzed developer networks on
GitHub. Boris adds that while there is extensive literature on
scientific citation networks, the study of dependencies is less
explored. [00:34:38] Find out where you can follow Boris and Eva’s
work and social medias online. Spotlight [00:37:06] Richard’s
spotlight is Deirdre Madeleine Smith. [00:37:29] Eva’s spotlight is
Talley Lambert. [00:38:02] Boris’s spotlight is the CZI
Collaborators. Links SustainOSS (https://sustainoss.org/)
SustainOSS Twitter
(https://twitter.com/SustainOSS?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor)
SustainOSS Discourse (https://discourse.sustainoss.org/)
podcast@sustainoss.org (mailto:podcast@sustainoss.org) SustainOSS
Mastodon (https://mastodon.social/tags/sustainoss) Open
Collective-SustainOSS (Contribute)
(https://opencollective.com/sustainoss) Richard Littauer Socials
(https://www.burntfen.com/2023-05-30/socials) Eva Maxfield Brown
X/Twitter (https://x.com/evamaxfieldb) Eva Maxfield Brown Website
(https://evamaxfield.github.io/) Eva Maxfield Brown GitHub
(https://github.com/evamaxfield) Boris Veytsman X/Twitter
(https://x.com/BorisVeytsman?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor)
Boris Veytsman Mastodon (https://sfba.social/@borisveytsman) Boris
Veytsman LinkedIn
(https://www.linkedin.com/in/boris-veytsman-50a1162/) Chan
Zuckerberg Initiative (CTI) (https://chanzuckerberg.com/)
“Biomedical Open Source Software : Crucial Packages and Hidden
Heroes” (arXiv) (https://arxiv.org/pdf/2404.06672) “A large dataset
of software mentions in the biomedical literature” (arXiv)
(https://arxiv.org/abs/2209.00693) xkcd Dependency comic 2347
(https://xkcd.com/2347/) Dataset Artefacts are the Hidden Drivers
of the Declining Disruptiveness in Science (arXiv)
(https://arxiv.org/abs/2402.14583) Directed acyclic graph (DAG)
(https://en.wikipedia.org/wiki/Directed_acyclic_graph) Katz
centrality (https://en.wikipedia.org/wiki/Katz_centrality) Sustain
Podcast-Episode 136: Daniel S. Katz on The Research Software
Alliance (https://podcast.sustainoss.org/guests/katz) Sustain
Podcast-Episode 159: Dawn Foster & Andrew Nesbitt at State of
Open Con 2023 (https://podcast.sustainoss.org/guests/nesbitt)
Sustain Podcast-Episode 218: Karthik Ram & James Howison on
Research Software Visibility Infrastructure Priorities
(https://podcast.sustainoss.org/guests/james-howison) ORCID
(https://orcid.org/) Mapping the Impact of Research Software in
Science- A CZI Hackathon
(https://github.com/chanzuckerberg/software-impact-hackathon-2023)
Deirdre Smith Academia (https://pitt.academia.edu/DeirdreSmith)
Talley Lambert GitHub (https://github.com/tlambert03) Credits
Produced by Richard Littauer (https://www.burntfen.com/) Edited by
Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/)
Show notes by DeAnn Bahr Peachtree Sound
(https://www.peachtreesound.com/) Special Guests: Boris Veytsman
and Eva Maxfield Brown.
Littauer Show Notes In this episode of Sustain, host Richard
Littauer engages with guests Eva Maxfield Brown and Boris Veytsman
to explore their co-authored paper, "Biomedical Open Source
Software: Crucial Packages and Hidden Heroes." The paper focuses on
identifying crucial but often overlooked software dependencies in
biomedical research. The discussions delve into how the study used
data from two million papers to map these dependencies, revealing
both well-supported and undermaintained software components vital
to scientific research. There’s a conversation on the
methodological challenges and the concept of "Nebraska packages,"
which are essential yet potentially undermaintained elements
crucial to the software stack used in both industry and science.
The conversation also covers broader implications for software
sustainability, security, and future research directions, including
improving how software contributions are tracked and recognized
within scientific careers. Press download now to hear more!
[00:01:47] Richard dives into the paper co-authored by Eva and
Boris. Boris explains the origins of the paper, starting from a
workshop at CZI aimed at accelerating science through sustainable
software, leading to the analysis of software used in biomedical
research. He highlights the focus on identifying crucial yet often
unmentioned software dependencies in research software, which he
labels as “unsung heroes.” [00:05:22] Boris provides findings from
their study, noting that while many foundational packages were
cited, there are significant packages that, despite their critical
role, remain uncited. [00:06:43] Eva discusses the concept of
“Nebraska packages,” which are essential yet potentially
undermaintained components that are crucial to the software stack
used in both industry and science. Also, she elaborates on the
methodological challenges of determining which packages to include
in their analysis, particularly in terms of dependencies that vary
between different users and contexts. [00:09:42] Richard reflects
on the broader implications of their discussion for the open source
community, particularly in terms of software sustainability and
security. Eva emphasizes the importance of security across all
fields and discusses the potential impact of software bugs on
scientific research and the need for robust software
infrastructure. [00:12:04] Boris comments on the necessity of
well-tested tools in the scientific community, given that many
scientists may lack a strong background in software development and
training. [00:13:47] Richard quotes from the paper discussing the
absence of cycles in the network of software packages used in
science, indicating a more robust design compared to general
software. He questions this in light of earlier comments about
scientists not being great at coding. [00:14:08] Eva explains that
the paper’s findings about acyclic dependencies (DAGs) might seem
surprising given the common perception that scientific software is
poorly developed. She notes that while scientists may not be
trained in proper software packaging, the Python environment helps
prevent cyclic dependencies. [00:17:31] Richard brings up “Katz
centrality” which is discussed in the paper, and Boris clarifies
that “Katz centrality” refers to a concept by Leo Katz on network
centrality, explaining how it helps determine the importance of
nodes within a network. [00:20:13] Richard questions the practical
applications of the research findings, probing for advice on
supporting crucial but underrecognized dependencies within software
ecosystems. Eva addresses future research directions, including
improving ecosystem matching algorithms for better accuracy in
linking software mentions to the correct ecosystems. [00:22:50] Eva
suggests expanding the research to cover more domains beyond
biomedicine, considering different software needs across various
scientific disciplines. Boris discusses the potential for targeted
interventions to support underrecognized contributors in the
scientific software community aiming to enhance their prestige.
[00:27:22] Richard asks how the research team plans to map
dependencies to individual contributors and track their
motivations. Boris responds that while they have gathered
substantial data from sources like GitHub logs, publishing this
information poses ethical challenges due to privacy concerns.
[00:28:45] Eva discusses her work on linking GitHub profiles to
academic authors using ORCID identifiers to better track
contributions to scientific software. [00:31:42] Richard brings up
the broader impacts of their research, questioning whether their
study on software packages centrality within the scientific
community is unique or if there are similar studies at this scale.
Eva acknowledges the need for more comprehensive studies and cites
a previous study from 2015 that analyzed developer networks on
GitHub. Boris adds that while there is extensive literature on
scientific citation networks, the study of dependencies is less
explored. [00:34:38] Find out where you can follow Boris and Eva’s
work and social medias online. Spotlight [00:37:06] Richard’s
spotlight is Deirdre Madeleine Smith. [00:37:29] Eva’s spotlight is
Talley Lambert. [00:38:02] Boris’s spotlight is the CZI
Collaborators. Links SustainOSS (https://sustainoss.org/)
SustainOSS Twitter
(https://twitter.com/SustainOSS?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor)
SustainOSS Discourse (https://discourse.sustainoss.org/)
podcast@sustainoss.org (mailto:podcast@sustainoss.org) SustainOSS
Mastodon (https://mastodon.social/tags/sustainoss) Open
Collective-SustainOSS (Contribute)
(https://opencollective.com/sustainoss) Richard Littauer Socials
(https://www.burntfen.com/2023-05-30/socials) Eva Maxfield Brown
X/Twitter (https://x.com/evamaxfieldb) Eva Maxfield Brown Website
(https://evamaxfield.github.io/) Eva Maxfield Brown GitHub
(https://github.com/evamaxfield) Boris Veytsman X/Twitter
(https://x.com/BorisVeytsman?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor)
Boris Veytsman Mastodon (https://sfba.social/@borisveytsman) Boris
Veytsman LinkedIn
(https://www.linkedin.com/in/boris-veytsman-50a1162/) Chan
Zuckerberg Initiative (CTI) (https://chanzuckerberg.com/)
“Biomedical Open Source Software : Crucial Packages and Hidden
Heroes” (arXiv) (https://arxiv.org/pdf/2404.06672) “A large dataset
of software mentions in the biomedical literature” (arXiv)
(https://arxiv.org/abs/2209.00693) xkcd Dependency comic 2347
(https://xkcd.com/2347/) Dataset Artefacts are the Hidden Drivers
of the Declining Disruptiveness in Science (arXiv)
(https://arxiv.org/abs/2402.14583) Directed acyclic graph (DAG)
(https://en.wikipedia.org/wiki/Directed_acyclic_graph) Katz
centrality (https://en.wikipedia.org/wiki/Katz_centrality) Sustain
Podcast-Episode 136: Daniel S. Katz on The Research Software
Alliance (https://podcast.sustainoss.org/guests/katz) Sustain
Podcast-Episode 159: Dawn Foster & Andrew Nesbitt at State of
Open Con 2023 (https://podcast.sustainoss.org/guests/nesbitt)
Sustain Podcast-Episode 218: Karthik Ram & James Howison on
Research Software Visibility Infrastructure Priorities
(https://podcast.sustainoss.org/guests/james-howison) ORCID
(https://orcid.org/) Mapping the Impact of Research Software in
Science- A CZI Hackathon
(https://github.com/chanzuckerberg/software-impact-hackathon-2023)
Deirdre Smith Academia (https://pitt.academia.edu/DeirdreSmith)
Talley Lambert GitHub (https://github.com/tlambert03) Credits
Produced by Richard Littauer (https://www.burntfen.com/) Edited by
Paul M. Bahr at Peachtree Sound (https://www.peachtreesound.com/)
Show notes by DeAnn Bahr Peachtree Sound
(https://www.peachtreesound.com/) Special Guests: Boris Veytsman
and Eva Maxfield Brown.
Weitere Episoden
34 Minuten
vor 9 Monaten
46 Minuten
vor 9 Monaten
40 Minuten
vor 9 Monaten
38 Minuten
vor 10 Monaten
Kommentare (0)
Melde Dich an, um einen Kommentar zu schreiben.