Sunday, February 15, 2015

Code4Lib Day 2 - Presentations


The second day of Code4Lib was a lot of presentations.  A total of fourteen twenty-minute presentations as well as ten five-minute lightning talks filled the day from 9 until almost 5.  As for day one I'm listing them all here with favorites highlighted.  The presentation number indicates the overall number in the conference here, not the number of a presentation on day 2, which is why the first one listed here is presentation number eleven.

Presentation 11: Jennie Rose Halperin - Our $50,000 Problem: Why Library School?

This was a popular and really challenging presentation, although it really didn't have a great deal to do with code.  Jennie Rose Halperin has a library science degree although she doesn't work at a library.  She works for Safari Books Online, a company that sells subscriptions to an electronic library of computer books.

Ms. Halperin started her presentation listing a few bits of advice that she got when she told people she was going to go to library school.  The key bit of advice on which she focused her presentation was "Get as many tech skills as you possibly can."

The reason that made this quote notable was the fact that library schools have been extremely light on teaching technological skills.  This isn't because working in libraries does not require technical skills at all.  This conference was dedicated to speaking to people working in libraries, most of whom either had significant technical skills or they were interested in learning from people who did have significant technical skills.  The problem is that as libraries have had to adapt to a rapidly changing information landscape, library schools have had difficulty adapting their curriculum to adequately prepare their graduates for that landscape.  Even things that should be shoe-ins for basic technical concepts to be taught in library schools, like relational database design and SQL syntax, have not received adequate focus (based at least on my dated experience with library school and the comments of the presenter).

Meanwhile, the cost of master's degrees (across the board) continues to rise while entry level pay for library jobs has generally stagnated.  The presenter mentioned the absurdity of obtaining $40K in school debt to get a job that, on average, is paying something close to $32K annually. 

Ultimately, there is a lack of connection between degree, cost, and what students need to be learning.  One might think that maybe library students aren't learning higher technical concepts out of a lack of interest, but Halperin argued convincingly that a lack of exposure to technical skills is a greater problem than lack of interest.

To resolve this problem in library schools, Halperin argued that we need scholarships, apprenticeships, curriculum reform, and professional development opportunities in the world of libraries.

Presentation 12: Margaret Heller, Christina Salazar and May Yan - How to Hack it as a Working Parent: or, Should Your Face be Bathed in the Blue Glow of a Phone at 2 AM?

Three working mothers made this impassioned plea for changes in how libraries (and really, all American businesses, particularly with an IT component) treat women and mothers.

Early in their presentation, the three presenters provided this sobering statistic: "an American woman's earnings decrease by 4 percent for every child that she bears…after men have kids, their earnings increase, on average, by 6 percent".  Essentially, men who take time away from their jobs are viewed more positively than women who take time away from theirs.

Part of the problem lies in U.S. policies.  The U.S. has zero days of paid maternity/paternity leave unlike most other developed countries.  Canada, in particular, has a superior leave policy making it much easier for parents to get time off of work.  It is easier there also to hire temporary workers to fill in for someone on leave.

For those anticipating extended periods of leave, it is a good idea for workers to document their job well and use generic email addresses.  If necessary, the creative division of one worker's responsibilities amongst several coworkers can help as an alternative to hiring temporary workers.

Strict time management and project management tools can help working mothers work shorter days while maintaining productivity.  This is a major concern as frequently people who used to work long days now need to shorten the amount of time spent at work without reducing the amount of work being done.  The presenters have found that smart phones have been a major boon in their ability to allow workers to stay in touch with coworkers and family and get work done wherever they are.

Presentation 13: Kevin S. Clarke - Docker? VMs? EC2? Yes! With Packer.io

Packer.io is a tool that can be used to build computer images.  To use it you create a template designed to work with Packer.io and then it uses a virtual machine builder (Docker, VirtualBox, VMWare are examples) and a provisioner (Ansible and Bash are examples) to assemble and configure an image.

It and the tools that it works with are designed to work with Linux and other POSIX operating systems.  Using it you can quickly install or reinstall a server configuration.  Normally if you set up a computer with a complicated piece of software and then you need to rebuild the server this can take a lot of work.  Packer.io makes it possible to script out most of this process so that building a new, identical server from scratch can be fast and easy.

This presentation described the basic process involved in doing this and went through a lot of the details.  It was quite interesting and was something I think I'd be interested in learning more about.

Presentation 14: Axa Mei Liauw and Kevin Reiss - BYOB: Build Your Own Bootstrap

This was an intriguing presentation about creating a more flexible alternative to Twitter's Bootstrap framework.  Bootstrap is an open package of Cascading Style Sheets and Javascript that can be used to quickly assemble a decent looking website.  It is quite popular among web programmers who have few design skills (not entirely unlike myself) because it can be used to rapidly put together a responsive website that looks decent without having to worry too much about design particulars.


The presenters pointed out a few issues that they had with Bootstrap.  Bootstrap markup can be a little dense and heavy, and their BYOB alternative is cleaner in this regard.  Bootstrap can't be made accessible (easy to use for those using screen readers and coping with other disabilities) without the use of a plugin.  Also if a designer does get involved at any point, Bootstrap can be a frustrating environment (the presenters described it as the "Times New Roman" of the web).

To accomplish their BYOB solution, they assembled a variety of web packages that can work nicely together into a single solution.  First they "decoupled" the HTML and CSS in their design.  This basically means that they made sure that the HTML and CSS used to style it could stand alone.

Having done this, they used a CSS templating tool called Sass to create new CSS.  The specific brand of Sass they used was SCSS, which looks largely like CSS, making it easier to maintain and less of a conceptual leap.

On top of this they used a Sass library, called Bourbon, an alternative grid system called Singularity, a Sass add-on called Breakpoint for loading in images and other media, and a set of default styles from a package called Bitters.

This combination makes really quick to edit code that provides a great deal of flexibility, full accessibility, and clean, easy-to-read markup that doesn't have a great deal of redundancy.

This was a really interesting overview of a variety of powerful tools.  I could have done with a much longer time to cover them in detail (everything felt a little rushed) but it gives me something to research on my own.

Lightning Talks II

Rights Metadata - This presentation described issues that the presenter was having with providing custom access rights on a system using a repository solution called Hydra.  The presentation described their process to create a custom solution that would address these issues.

Code4Lib Japan - The presenter described the 2014 regional Code4Lib conference held in Sabae, Fukui, Japan the many open initiatives and projects that were discussed and worked on at that conference.

Arduino as a Learning Platform - This presentation, made by Domenic Bordelon of East Baton Rouge Library, was quite interesting to me. Bordelon described the Arduino classes they have had there as a method of teaching programming.  As I have taught an Arduino class at my library and am planning more, I was quite interested in hearing what he had to say.  After his lightning talk I met up with him and shared some information.  He gave me some good ideas which I'm hoping to mix into the classes we are already offering.

PreForma Project - This presentation had something to do with processing of files to go into archives.

PBCore RDF Ontology Hackathon - This hackathon occurred right before the conference.  PBCore is the Public Broadcasting Metadata Framework and RDF is the Resource Description Framework.  This presentation was largely over my head.

RDF, Fedora, and ActiveFedora for Relational Heads - Similar to the previous presentation, this tackled the use of RDF in the Fedora repository system.

Building a Bibframe Catalog - The presenters gave an overview of their work in implementing a new web-based catalog for the Library of Congress using the Library of Congress's new Bibliographic Framework Intiative known as BibFrame.

Murkurtu CMS - The presenters here described an open source content management system called Murkurtu that was designed for hosting digital collections serving indigenous peoples in North America.

How Do We Become Better Developers . This lightning talk looked broadly at the subject of becoming a better software developer.  The presenter's key points were: developers should work in groups of at least two as it is important to work off of someone else, buying training materials, read The Pragmatic Programmer by Andrew Hunt and David Thomas, and review one another's code.

Drupal, Git and Sanity - This presenter described the process of managing the complicated code in the Drupal-based sites managed by their institution by loading it into Git.  That Git repository then can serve as a basic model for all websites.  If they need to bring up a new website they can just do a pull on the Git repository and that will load in all of the current necessary files.

Presentation 15: Jessie Keck and Jack Reed - Making Your Digital Objects Embeddable around the Web

The presenters in this talk had a lot of digital objects and they kept recreating the wheel of the image viewer.  They generally haven't always been happy with their approach or their solutions.  They decided to use existing providers for the distribution of their materials (Hulu, Flickr, Twitter, YouTube, Instagram, Slideshare, Speakerdeck) and then use oEmbed to bring the resources hosted on those different services into their website.

Using oEmbed they were not only able to find a way to make it easy for developers to easily embed multimedia from different sites on their website, they were able to implement an oEmbed solution on their end so that others could embed content from their site into other sites.

Presentation 16: Naomi Dushay and Laney McGlohon - Digital Content Integrated with ILS Data for User Discovery : Lessons Learned

The presenters, from Stanford University -- the same institution as the presenters immediately preceding them -- were discussed another aspect of the same problem that the previous presenters had concerned themselves with.  In this case they discussed their process of linking to and embedding objects directly from their library catalog.

Presentation 17: Wayne Schneider - Dynamic Indexing: a Tragic Solr Story

I found this presentation interesting insofar as it involved a public library and described how they addressed some interesting problems.  It didn't seem particularly relevant as something that I need to do or could do without access I don't currently have.

The library wanted better indexing of their materials including live status information (or as close to it as they could get) about the availability of downloadable materials from within their public facing catalog.

Most public facing catalogs anymore do not just query the main database, but they have their own databases which they query, which is much faster.  Unfortunately, because the information is not coming directly from the catalog system itself, it can wind up being out-of-date.

The solution to this problem provides much of the functionality that they would get if the catalog itself was being searched directly, but is actually a separate database (hence the "faking it" part of the name of their presentation).  They have a rather complicated workflow by which they get a dump of their Horizon ILS, run it through a variety of processes, and then dump that output into an index for Solr, a common server-loaded search engine.

By automating this process and doing it frequently, they have managed to keep what appears to be a dynamic index of their catalog up-to-date when it is in fact hosted on a separate server.

Presentation 18: Jason Thomale - You Gotta Keep 'em Separated: The Case for "Bento Box" Discovery Interfaces

There were some intriguing aspects to this presentation that I had not expected.  The main point was an evaluation of the idea that patrons prefer as single search box with a unified search results list.  The presenters argued pretty passionately that users do not like unified results list, but rather like them segregated by type (like a bento box has each kind of food in its own compartment).

What was more surprising than the end conclusion was the way that they came to that conclusion.  Using the Google Analytics API (and I had never realized that Google Analytics had an API -- although in retrospect I guess this shouldn't surprise me) they collected two years of achingly detailed data on what users did with search results on their website (when did they click on facets, when did they put in limits, when did they do follow-up searches, what kind of searches resulted in what kind of clicks in each segregated compartment of their bento box interface, etc.)

They did some of the same kinds of things that I've seen done in evaluation of interfaces before, but by sucking data directly out of the Google Analytics API and then parsing that data using their own code to find specific patterns was a new thing for me, and quite impressive.

Presentation 19: Jason Casden and Bret Davidson - Beyond Open Source

The presenters here opened up with a brief history of the open source movement. They made the point that "Libtech" (a term used throughout the conference to refer to technology used for library purposes) should be able to thrive in diverse environments.


They argued that libraries have done a good job about creating a collaborative environment, but it favors those who have resources already.  The presenters felt that something needs to be done to try and help libraries with more limited resources implement technologies.


Although creating better installers might be able to be used to help and the use of hosted and managed services might help, virtualization technology probably has the most promise in bringing new services to libraries.  Virtual environments and virtual containers, solutions created with software such as Vagrant and Docker, hold particular promise for being able to fill this role.  They argued that it is important to make open source software easier to install, maintain and evaluate than it currently is.


The presenters have been working on a reference statistics collection project called Suma which has seen relatively light adoption in more limited environments (including public libraries).  They feel that most Suma users have the greatest amount of trouble with the install, and by making a Docker installer for it they hope to make it easier for libraries to try out and use this software.

As it happens I've been looking for a good piece of reference statistics collection software so this project holds a lot of interest for me and I look forward to testing it.

Presentation 20: Matt Connolly and Jennifer Colt - Awesome Pi, LOL!

This was a cute presentation describing how the presenters made and implemented an Awesomebox (Awesomebox.io) using a Raspberry Pi.  The Awesomebox is a cool thing that I learned of last year at Computers in Libraries 2014.  It was developed at Harvard with the idea that patrons could easily recommend materials just by scanning the barcode, and then items could be recommended as being "awesome".

Seeing it done with a Raspberry Pi and a generally limited resource set makes me almost interested in implementing it, if I had an idea of where to put it.

Presentation 21: Rebecca Fraimow and Casey Davis - American (Archives) Horror Story: LTO Failure and Data Loss

120 public radio and television stations had lots of analog archival recordings and it needed to be digitized to be sent to the Library of Congress for public access.  The presenters, who work for WGBH, worked on trying to get this done.

This was an interesting presentation, largely because the whole thing has kind of been a mess.  For whatever reason the digital copies of many of the files have had an extremely high failure rate in getting re-encoded for sending them to the Library of Congress.

They initially had a failure rate of 57% in their digital conversion process.  They wound up having to develop a process for identifying processes that were failing and automatically rerunning them.  Even after that they are still getting a 20% failure rate and they aren't sure why.  It could be that the files are corrupt or it could be a problem with the files that they need to develop a specialized program to work around. 

Presentation 22: Rob Sanderson and Naomi Dushay - Annotations as Linked Data with Fedora 4 and Triannon (a Real Use Case for RDF!)

This was a third presentation from Stanford University on the embedding/linking of objects, in this case they were focused on maps.  They used an objects model called JSON-LD (JSON being the storage model used with JavaScript that has become a defacto standard in many circles) which apparently makes a structure that is directly analogous to RDF using JSON.

Presentation 23: Kathryn Stine and Stephanie Collett - Consuming Big Linked Open Data in Practice : Authority Shifts and Identifier Drift

The presenters here had wanted to synchronize their database with information in the VIAF (Virtual International Authority File).  They had a really large database (9.5 million records) and as OCLC (the owner of the VIAF) has a hard limit on the number of queries they could do against the VIAF on a daily basis, they had to spread out their process of checking those 9.5 million records over a period of weeks.  The problem is that the VIAF is constantly being updated and their database was constantly being updated so they had a huge time with data drift.  They didn't really have a great solution to this problem.
Presentation 24: Audrey Altman, Gretchen Gueguen, and Mark Breedlove - Heiðrún : DPLA's Metadata Harvesting, Mapping and Enhancement System

This presentation was made by three people from the Digital Public Library of America (DPLA).  DPLA stores 8.4 million records, from 23 hubs and 1350+ contributing institutions.  The data they receive is sent using nine different schemas, not all of them XML-based.

DPLA had a complex and problematic process for sucking in data and processing it.  To ease things they developed a new system called Heiðrún , which was named after a goat in Norse mythology who consumes leaves and produces mead (MEtAData).  The presenters filled their presentation with many allusions to goats which made it pretty fun.  It was a detailed description of the way their complicated system works.

No comments: