Monday, April 14, 2014

Computers in Libraries 2014 - Day 2 - Library Data Mashups

In this session Mike Crandall, Samantha Becker, and Becca Blakewood from the Information School of the University of Washington described what data mashups are, where to find data, and how to mash it up.

They began by describing the concepts of mashups in general, where for instance, audio from one source has been mixed with video from a different source to create a new, doubly-derivative work that is entertaining in a way that neither of the original works are on their own.  Data mashups do this with data, and can be fun in their own way.  The presenters suggested that using data mashups can be an effective way to advocate for your library as mixing data from different sources can build powerful statements on library service need, reach and utilization.

On the topic of data sources to mash-up, the presenters suggested these sources as a starting point:

  • National Sources: IMLS Public Library Survey, Edge Initiative, Impact Survey, Census Data
  • Local Sources: Community indicators, City/county data, community anchor institutions or agencies
  • Your Sources: Library use statistics, circulation statistics, patron surveys

After discussing and other sources some, they went on describe approaches to mashing up data.  The first approach is to get a "conceptual mash."  A conceptual mash doesn't make for a pretty graph, largely because it can be a bit of a mismatch of data.  However it can point in a direction that is ripe for further intelligence gathering.  The following example was given of a conceptual mash.

The presenters took national, Texas, and local information for New Braunfels, Texas.  On each level they compared married couple family, never married, families with elementary school children, Hispanic, and non-English speaking households.  That data indicates a higher level of Hispanics living in New Braunfels compared to other places.  This information can then be compared against the Pew Library Typology report and that indicates that statistically more Hispanics are in the "Distant Admirers" group than are in any other group.  Based on that it might mean that there is something to be gained in reaching out to the Hispanic community in New Braunfels.  This information needs to be validated, but it creates a working hypothesis that can be explored.

A second type of data mashup is an "Actual Mash."  This is where there are datasets from different sources that can be directly compared or joined using data points that are in both sets.

As an example of this, the presenters looked at the Edge assessment of library technology access and linked that up with data from the Public Libraries Survey to try and determine if there is a correlation between a high edge score and library size.  The Public Libraries Survey had library size available, the Edge assessment information had Edge scores available, and both had the names of the libraries being evaluated.  That meant that the data sets could be joined up to create a greater data set.

In this case they pointed out some pitfalls of not looking at data very carefully.  A simple bar graph would seem to indicate a correlation between Edge score and size, but a closer analysis actually shows that very small number of extremely large libraries generally have extremely high Edge scores while other libraries across the spectrum tend not to have much of a correlation between size and Edge score.

 With the amount of data available from different agencies and from libraries themselves, the possibilities are strong for being able to create complex actual mashes of data using a variety of data sets and matching points.

The presentation had a handout with data sources on one side and tools for data analysis on the other.  There was a brief overview of some of the many tools available.  Here is a quick list of the tools mentioned on that handout:

1 comment:

Unknown said...

Visit Troop Social for social media management services. https://www.troopsocial.com/