Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We envision a future where all of the key data resources and their providers appear to us as a coordinated team ready to take on our question of the day.  For instance, as we become aware of an emerging disease outbreak, we might visit the Global Health Observatory (GHO) from the WHO to get an overview of the latest reported infections.  We wish to quickly explore a hypothesis regarding infection propagation and vectors.  The GHO has In this world, the GHO would have links to other key portals, such as the Gridded Population of the World and Global Biodiversity Information Facility (GBIF), which we can browse access to refine our hypotheses.  We then imagine that there are other important As we discover the social factors that play into a role in the spread of disease spread, and we realize that we will need to pull be able to access and integrate information about roads, or schools, or labor statistics, places of employment, etc..  In particular, we have an idea of the kinds of observations or measurements we need–say, both the locations of schools and their populations.  From Starting with the GHO site, we click can then trigger a button query that submits a query for datasets the appropriate data available from anywhere in the world related to such subjects that correspond geographically near the location of the emerging outbreak.

We might pause a moment in this story of an emerging outbreak to imagine how a university researcher might pose similar questions a year before the outbreak in an effort to predict its occurrence.  Perhaps she is browsing the GBIF portal exploring populations of reptiles as a possible disease vector and realizes that she needs data regarding roads, schools, or labor statistics.  From the GBIF website, she clicks a button that submits a query for such datasets available from anywhere in the world that over lap overlap with the range of particular reptiles.  Indeed, there might be a dozen different data portals that she could visit that can tap into a global data search engine find data that correlates with data provided by that portal.

But now in our present, we have an actual outbreak to understand.  We have done some initial browsing and searching to the point of having a refined hypothesis about the infection propagation, and we have identified some candidate datasets we can leverage.  Through either refined searches or through filtering of our search results, we are able to find a database that can tell us the location of schools near the outbreaks as well as a second dataset data source that lists the schools populations.   Both of these data sources can export their data into a common annotated table format, so we download these data directly into our R Studio environment on our local laptop where we can combine the school location and populations into a single table.  Our R environment already has a module for pulling the range data from GBIF, so from R Studio, we create a table contain range information for our suspect reptile species.  Our initial analysis through quick plots of the data show some interesting correlations, so now we need to see how this data relates to current infection incidents.

...