Episode 42: Open Sourcing COVID-19 Data with Cindy Wang & Gil Yehuda
vor 5 Jahren
Podcast
Podcaster
Beschreibung
vor 5 Jahren
Sponsored By Linode
Panelists Justin Dorfman | Eric Berry | Richard Littauer
Guest Cindy Wang (https://www.linkedin.com/in/cindy-wang-365233/)
Sr. Director, Product Management, Yahoo Knowledge Graph Verizon
Media Gil Yehuda (https://twitter.com/gyehuda) Sr. Director of Open
Source Verizon Media Show Notes Hello and welcome to Sustain! In
this episode, we have special guests, Gil Yehuda and Cindy Wang,
who both work for Verizon Media, which is a combination of a bunch
of companies, predominantly Yahoo and AOL. Gil is Senior Director,
Open Source Program and Cindy is Sr. Director, Product Management,
Yahoo Knowledge Graph. We learn more about Gil and Cindy’s
positions with Yahoo, the Yahoo Knowledge Graph COVID-19 project,
data sets, complications with data, and Vespa (open source big data
serving engine). [00:02:26] Gil explains to us what coverage he has
and what he’s responsible for in his OSPO (Open Source Program
Office). He also tells us how many repos and orgs he’s managing.
[00:05:29] Cindy tells us all about the Yahoo Knowledge Graph
COVID-19 project. Justin questions data sets and its
inconsistencies and Cindy explains. [00:12:30] Eric asks Cindy if
this resource has been established as an authority and if she’s
heard feedback or others pointing to this as the authoritative data
source? [00:14:00] Gil explains to us two levels of complications
with data that he’s observing. [00:18:30 ] In regard to financial
incentivisation, Eric wonders what has been their experience, or
have they had any feedback from people who are trying to massage
the numbers in their favor? [00:21:22 ] Richard wants to know if
there is any code open source and can people look at that? How can
people get involved and what was that process like besides the data
aspects? Also, Gil tells us if he has any pushbacks from making any
of this stuff open. [00:29:01] Gil mentions Vespa.ai, an open
source big data serving engine. Richard wonders if Gil has thought
of long term plans for how he sustains this work and how it’s going
forward and what teams will be on it, and will it just be open
source in the sense of like a year? [00:31:57] Richard wonders if
Gil and Cindy have plans to onboard people from the community who
are interested in the data who are helping out so that they also
become maintainers, so it’s not just a Yahoo only project
internally. [00:33:08] Eric asks Gil to elaborate on a follow up
question where he said he was using these tools internally. Cindy
tells us all about the tools. Also, Eric wonders if there was any
questions or concerns about licensing the open source and are
people allowed to build commercial applications on top of this
data? [00:40:24] Gil and Cindy tell us where people can get
involved in this project, how can you follow along, and how can you
follow them. Spotlight [00:42:20] Richard’s spotlight is Moment.js.
[00:42:39] Eric’s spotlight is a project built by Jared White
called Bridgetown, which is an updated version of Jekyll.
[00:43:49] Justin’s spotlights are to thank Ashley Wolf
(https://twitter.com/Meta_Ashley/status/1254803742248955904) for
putting this whole thing together and a browser extension called
Read Aloud, a text to speech voice reader. [00:44:31] Gil’s
spotlight is a project called Denali. Quotes [00:04:51] "AOL had an
OSPO and they didn’t have an OSPO and they kind of had an OSPO, but
when we merged together we brought it together and we just continue
to do what we do.” [00:05:04] “Before OSPO there was Open Source
activity because as you know companies do Open Source even without
OSPO’s. They just do Open Source better with OSPO’s.” [00:14:00]
“There’s two levels of complications with data that I’m observing
and there’s probably more, because there’s always more to
everything.” [00:14:48] “But then there’s this other element which
is, I don’t know, maybe it’s the political nature of data.”
[00:16:23] “And I guess all of the paddling that goes on under the
surface of the water to collect that data and to be as accurate as
you can, but also to connect it to the source so that you could
investigate it.” [00:20:36] “The training set has to be clean, so
they actually spend 80% of their effort in cleaning the data.”
[00:34:28] “So, for example, you look at some states now after
opening, the numbers shot up. So, is it concerning from business
planning perspective? Perhaps.” [00:37:23] “We have hundreds of
millions of entities in this graph that represent billions of
pieces of information that we use across the company for all types
of things, like how the news stream is ordered.” Links Gil Yehuda
Twitter (https://twitter.com/gyehuda?lang=en) Gil Yehuda LinkedIn
(https://www.linkedin.com/in/gilyehuda) Cindy Wang LinkedIn
(https://www.linkedin.com/in/cindy-wang-365233) Yahoo-Covid-19 Data
(https://github.com/yahoo/covid-19-data) Yahoo Covid-19 Dashboard
(https://yahoo.github.io/covid-19-dashboard/) Yahoo Knowledge
COVID-19 API
(https://github.com/yahoo/covid-19-api/blob/master/README.md)
Yahoo-GitHub (https://github.com/yahoo) Vespa-Github
(https://github.com/yahoo) Yahoo! Developer Network (YDN)
(https://developer.yahoo.com/) Yahoo! Developer Dash Open Podcast
(https://developer.yahoo.com/podcasts/) Read Aloud
(https://chrome.google.com/webstore/detail/read-aloud-a-text-to-spee/hdhinadidafjejdhmfkjgnolgimiaplp?hl=en)
Bridgetown (https://www.bridgetownrb.com/) Denali
(https://denali.design/) OpenStreetMap
(https://www.openstreetmap.org/#map=5/38.007/-95.844) Leaflet.js
(https://leafletjs.com/) Credits Produced by Justin Dorfman at
CodeFund Edited by Paul M. Bahr at Peachtree Sound
(https://www.peachtreesound.com/) Show notes by DeAnn Bahr at
Peachtree Sound (https://www.peachtreesound.com/) Ad Sales by Eric
Berry at CodeFund Special Guests: Cindy Wang and Gil Yehuda.
Weitere Episoden
34 Minuten
vor 9 Monaten
46 Minuten
vor 9 Monaten
40 Minuten
vor 9 Monaten
38 Minuten
vor 10 Monaten
Kommentare (0)
Melde Dich an, um einen Kommentar zu schreiben.