Thoughts of a Reluctant Wizard

Saturday, November 21, 2015

LITA Forum 2015 - Patron Privacy

The second and third sessions I attended at this year's LITA forum were both on patron privacy issues, so I'll combine them here into a single post.

The first of these sessions was made by Todd Carpenter of the National Information Standards Organization (NISO). He explained the process that NISO has been going through, with collaboration from libraries, publishers, and vendors to establish a set of standards for how much data from users is collected, what restrictions are put into place on the collection of data, and how that data can be used.

This topic has been a kind of Catch 22 for libraries for the past few years. Libraries, by ethical mandate and by law, do not retain information about patrons and the use of the materials they borrow past such point as materials have been returned and any fines are paid. At the same time patrons, not understanding the fact libraries have this policy or just not understanding the full ramifications of those policies, get frustrated by the fact we can't do something like Amazon does and recommend new titles based on reading history. This situation makes libraries look like they are backwards in their ability to provide this kind of service, when it's at least partly because our hands are tied when it comes to collecting data and using it in this way.

Given this history, any adjustment to library behavior is controversial within libraries but there is also a strong feeling, particularly at public libraries, that it is necessary to do something to address the fact that we haven't been able to offer a service that our heaviest users expect us to be able to offer.

The NISO initiative attempts to put forward solutions on how to resolve this problem. The initial process involved a series of meetings with different stakeholders to hammer out basic principles. These principles are collected into a draft document that is to be published soon and includes concepts such as that patrons need to opt-into such systems (no one is having information collected without their express consent), that data needs to be anonymized as much as possible, should be used only for providing service to the individuals, and that the information should be available to users.

This initial document will be a high-level overview of the problems and will be followed by other documents establishing best practices and other kinds of implementation details.

This session was followed by a session by Jim Jonas of the University of Wisconsin. His talk focused on privacy education, arguing that in the Internet age that "privacy literacy" should be as much of a part of library service as literacy has historically been. Jim described the overall state of the Internet and modern data collection that makes privacy literacy an important topic and why libraries are particularly well situated for providing this kind of information. He then went through an hour-long class that he has offered in which he introduces to users some privacy concepts, why privacy is important, and some approaches for maintaining one's privacy. He plans to also offer a follow-up class describing specific tools for managing privacy.

Jim also provided copies of the presentations he uses for his classes as well as a privacy resources page that he maintains for the University of Wisconsin.

I found both of these presentations informative and agree that we should offer programs on privacy. One concern that was expressed by someone attending the program that I felt was well-stated is that it is hard to get people to be excited about going to classes like this. Finding a good angle to make the class appealing is probably one of the biggest challenges in offering this kind of programming.

Saturday, November 14, 2015

LITA Forum 2015 - Minecraft in Real Life Club

In this first session that I attended Mary Glendenning, Library Director at Middletown Free Library in Pennsylvania, discussed the programs that their library has been doing to engage kids who love Minecraft. For the most part they haven't been having programs that really use Minecraft as a piece of software though. Instead they've focused on derivative projects in which kids, using resources in the library's maker space, create physical objects that represent or are modeled from objects that can be found in the game.

Ms. Glendenning first described the game of Minecraft and then explained why they had chosen the game as something to build their programs around. The popularity of the game, particularly with pre-teen boys (a demographic that is difficult for libraries to attract) and its open-ended nature were a couple things that made the game attractive to them. Also of critical importance is the fact that the game can touch on four Institute of Museum and Library Services (IMLS) identified skills of importance: learning and innovation skills; information, media and technology skills; 21st century themes; and social and cross cultural skills. It was also these skills that they wished to target with their programming.

The presentation then provided a number of examples of the kinds of projects that kids have worked on and how those projects either expanded on concepts introduced in Minecraft or used Minecraft as a starting point. For example, in the game it's possible to build a kind of circuit using a material called "redstone." To demonstrate how that concept was relevant to the real world they helped kids build real electronic circuits using some kid-friendly breadboards and electronic equipment. Another circuitry project was to create a Minecraft torch using soft circuits in fabric as sewing projects, which sounds like something we could do.

Another project used Minecraft and some extra software to create, export, and print 3D designs. I found this particularly interesting as we have Minecraft installed on public computers and kids frequently have difficulty creating designs in more conventional 3D CAD software.

Kids also created Minecraft stories using video production and audio editing software. Even making Minecraft food from rice crispy treats and making a Minecraft Christmas tree from discarded boxes were projects.

This presentation provided a lot of thoughts on the ways we might be able to integrate our Minecraft software and the interest in Minecraft that kids have with the resources available in our library and in our makerspace.

Friday, November 13, 2015

LITA Forum 2015 - Opening Keynote

The 2015 LITA (Library Information Technology Association) National Forum was kicked off with a keynote made by Lisa Welchman, president of Digital Governance Solutions. The first part of the keynote was an entertaining and personal reflection on how Ms. Welchman arrived at the position she has today and how those experiences prepared her for the world we now find ourselves in and its peculiar problems.

Ms. Welchman originally had dreams of being an opera singer, and her parents had their concerns about the reliability of that profession choice and made her learn to type. At university she dated someone for a time (the late '80s) who was studying computer science and he requested that she learn to use the Unix command line and the console-based email program Pine for corresponding with him. She asked for a gift of a Macintosh Plus from her grandfather (largely because of how it looked) and learned to use Hypercard to create a database of arias she knew how to sing.

These experiences prepared her to be a well-paid temp who could work on Lotus Notes development. Her experience with that led to an opportunity to work with the new company Netscape, which moved her to California, and that in turn landed her a job at Cisco working on their website in the late 90's (which she accidentally switched over to Japanese one day).

Life at Cisco, a company that was founded on computer networking and the Internet, showed her how badly people have managed to organize that which they put on the Internet, which nowadays is pretty much everything. She founded the company she runs today on the knowledge that if Cisco can't figure out how to properly organize and structure information for the Internet, there are a lot of people that can't and those people are in need of a company that specializes in assisting with this problem.

With this background established, Ms. Welchman proceeded to explain that although the Internet is new and different, in many ways it has commonalities with developments that have come before. These developments follow a pattern, including those who are threatened by a technology, dismiss it failing to recognize its inevitability, and are swept away as it is adopted. There are also those who have vision about what the technology might lead to, although they have a bit of a tendency to overestimate certain parts and underestimate others. What eventually happens is that the chaos that develops at the beginning of widespread adoption turns to a need for standards and regulation.

Ms. Welchman argued that rather than being restrictions that limit something's full potential, standards and restrictions focus a technology, allowing it to reach its full potential. Governance is the process of providing a framework that determines a strategy and policy to achieve standardization through the efforts of a team. That team itself a collaboration with different individuals at different levels. There is a core team that works on writing up policies and standards. That core team works closely with the people that are responsible for actually making stuff, the different liaisons that connect an organizations "siloed" departments with the rest of the team, and those vendors outside of an organization that are assisting the rest of the team.

Ms. Welchman defined what comprises policy and standards in a digital environment. Policy consists of high-level statements of beliefs, goals and objectives which are made to comply with laws, manage risk, or drive competitive advantage. Digital standards are the formal specifications that guide what is to be done in regards to various aspects of digital publication and development which can be divided into network and infrastructure, design, editorial, and publishing or development concerns.

I found this talk to be quite interesting and I enjoyed seeing the emphasis on establishing standards for information management, organization and presentation provided here, particularly coming from someone outside of the library field.

Sunday, February 15, 2015

Code4Lib Day 3 - Closing Sessions

Following Andromeda Yelton's keynote I attended two more sessions before the end of the conference: the third lightning round and an Ask Anything session where a group of about 10 people bounced questions off of one another.

Lightning Talks III

OpenGeoMetadata - This is a project that attempts to address the problem in that no one creates metadata and no one shares metadata. The project uses GitHub as a platform, utilizes an extendable schema and is standard agnostic. There are shared toolkits in python and ruby.

IIIF Image Field Drupal module - This was a demo of method of integrating images from a repository into Drupal.

Clustering Moving Image Works - This presentation described a user's problem with multiple formats or groupings of formats of different items. The presenter was hoping someone might have some good ideas.

Islandora Fedora 4 - In this presentation the Islandora digital asset framework was described working on the current version of the Fedora digital repository software.

Measure the Future - By far my favorite of this round of lightning talks, the presenter described his efforts of working with SparkFun to create cheap and open hardware with open source software for tracking library usage. The system will be designed to use machine vision tracking to anonymously track physical usage of a building. It could be used to evaluate stack usage and plan staffing. They will eventually have open tutorials on installation, configuration and usage. They are also working to make sure that there are no privacy concerns. We could use this to evaluate stack usage. Plan staffing.

Low-Hanging Fruit of Web Accessibility - This was a nice, short presentation encouraging users to check their website for accessibility concerns. One of the easiest ways to test your website for accessibility is to try and browse it without using a mouse and see how frustrating the experience is.

Planning for the Data Schlep - This was generally a discussion of moving data from one environment to another. The presenter recommended not to treat the code that you write to do this as disposable, but rather as code that you will be testing. He also recommended tools that he used for the process, indicating that others would work as well.

Archiving the silenced - The presenter created a database to present information about the history of eugenics in Alberta (eugenicsarchive.ca). A quick demonstration of the site was pretty impressive

The Great Migration: Fedora 4 - This presentation described the need to upgrade installations of the Fedora repository software from version 3 to version 4 and some resources to assist users in doing this.

Ask Anything

I found this to be a valuable session where attendees were able to ask others about problems they have. I asked for advice about a server upgrade I need to do and got some very good advice which I intend to follow.

Code4Lib Day 3 - Closing Keynote - Andromeda Yelton

Andromeda Yelton, a former middle school Latin teacher with degrees in mathematics, classics, and library science, who now does contract coding work and teaches librarians how to program, gave the closing keynote at Code4Lib 2015.

Yelton started her keynote asking "why did the web work?" Her answer was in two part: first that it was a resource with a determined agnosticism about what it was good for (i.e. it could hold all kinds of information), second that when it was designed by Tim Berners-Lee he gave all of the necessary code away so that anyone could run it.

Tim Berners-Lee had no idea that the WWW would be what it is now when he made it public, and Yelton used that as a jumping off point to look at how what evolves is frequently not what we would predict would happen at its outset.

To further illustrate this idea, Yelton told a story that when she was at a nerd camp 24 years ago a friend (who was later to become her husband) bought a stuffed mallard duck to play a part in a Monty Python skit the campers were going to put on. The skit got the kabosh because they wanted to do a Monty Python commentary on it as well that was to profane, but the duck lived on. On a whim Yelton made a sign-up list for the duck and a lot of her friends and acquaintances put their names on the list. Each year the duck has gotten passed onto someone else in the list, becoming something that has tied together this group of people who would likely have otherwise lost contact.

In this first part of her talk, the concept of wanderlust emerged. She described librarians who attend conferences as having a sense of wanderlust. She also described librarians as facilitators of wanderlust and libraries themselves are places that inspire wanderlust. They are places that lead to unexpected places and results. They introduce people to new ideas and different places and people and experiences without having any idea about how what kind of fruit will be born by these experiences.

Consequently, when we write code we are writing to facilitate wanderlust, or "architect for wanderlust" as Yelton put it in the title of her talk. Software that inspires an interest in different things and spurs this wanderlust is, by these terms, library software. Software that does not, is not.

Consequently, some software that is most associated with libraries, difficult to use catalogs that intimidate, is not library software in this sense. It does not belong in libraries and does not exemplify library ideals.

What is "library software" then? Yelton's answer covered a wide swath:

Homebrew, a package manager for Macintosh that allows users to easily install the wide array of software available for Linux/Unix computers on the relatively closed Mac platform
The world wide web itself
A website that she and others hacked up in a hackathon at Harvard that compares the subject headings of resources with their Stackscore to see which of the highest rated items actually deal with underrepresented topics of women, African Americans, and LGBT
The many interesting digital projects which have open APIs that the New York Public Library does make good examples of library software
Congressbot that checks anonymous Wikipedia articles to determine if agencies with vested interests have edited articles is library software
Code that lets us find stories that matter is library software
The lovingly crafted Zoia bot for the Code4Lib IRC channel which has been a collaborative effort is library software

Near the beginning of her talk, Yelton noted (as we observed several times over the conference) that this was the 10th year of Code4Lib. Her closing statement for her talk was that we need to spend the next 10 years building more library software.

I found this talk particularly energizing. Being in a public library minority I also had to love her answer to a question afterwards from an academic librarian. The academic librarian wanted to know what might be done to encourage wanderlust in an audience (college students) who seem to mainly go to the library to accomplish specific goals and not to explore. Yelton didn't have an easy answer for this question, but said that maybe we should look to those in the library community that have the most experience instilling wanderlust in their clientele: youth services librarians.

Code4Lib Day 2 - Presentations

The second day of Code4Lib was a lot of presentations. A total of fourteen twenty-minute presentations as well as ten five-minute lightning talks filled the day from 9 until almost 5. As for day one I'm listing them all here with favorites highlighted. The presentation number indicates the overall number in the conference here, not the number of a presentation on day 2, which is why the first one listed here is presentation number eleven.

Presentation 11: Jennie Rose Halperin - Our $50,000 Problem: Why Library School?

This was a popular and really challenging presentation, although it really didn't have a great deal to do with code. Jennie Rose Halperin has a library science degree although she doesn't work at a library. She works for Safari Books Online, a company that sells subscriptions to an electronic library of computer books.

Ms. Halperin started her presentation listing a few bits of advice that she got when she told people she was going to go to library school. The key bit of advice on which she focused her presentation was "Get as many tech skills as you possibly can."

The reason that made this quote notable was the fact that library schools have been extremely light on teaching technological skills. This isn't because working in libraries does not require technical skills at all. This conference was dedicated to speaking to people working in libraries, most of whom either had significant technical skills or they were interested in learning from people who did have significant technical skills. The problem is that as libraries have had to adapt to a rapidly changing information landscape, library schools have had difficulty adapting their curriculum to adequately prepare their graduates for that landscape. Even things that should be shoe-ins for basic technical concepts to be taught in library schools, like relational database design and SQL syntax, have not received adequate focus (based at least on my dated experience with library school and the comments of the presenter).

Meanwhile, the cost of master's degrees (across the board) continues to rise while entry level pay for library jobs has generally stagnated. The presenter mentioned the absurdity of obtaining $40K in school debt to get a job that, on average, is paying something close to $32K annually.

Ultimately, there is a lack of connection between degree, cost, and what students need to be learning. One might think that maybe library students aren't learning higher technical concepts out of a lack of interest, but Halperin argued convincingly that a lack of exposure to technical skills is a greater problem than lack of interest.

To resolve this problem in library schools, Halperin argued that we need scholarships, apprenticeships, curriculum reform, and professional development opportunities in the world of libraries.

Presentation 12: Margaret Heller, Christina Salazar and May Yan - How to Hack it as a Working Parent: or, Should Your Face be Bathed in the Blue Glow of a Phone at 2 AM?

Three working mothers made this impassioned plea for changes in how libraries (and really, all American businesses, particularly with an IT component) treat women and mothers.

Early in their presentation, the three presenters provided this sobering statistic: "an American woman's earnings decrease by 4 percent for every child that she bears…after men have kids, their earnings increase, on average, by 6 percent". Essentially, men who take time away from their jobs are viewed more positively than women who take time away from theirs.

Part of the problem lies in U.S. policies. The U.S. has zero days of paid maternity/paternity leave unlike most other developed countries. Canada, in particular, has a superior leave policy making it much easier for parents to get time off of work. It is easier there also to hire temporary workers to fill in for someone on leave.

For those anticipating extended periods of leave, it is a good idea for workers to document their job well and use generic email addresses. If necessary, the creative division of one worker's responsibilities amongst several coworkers can help as an alternative to hiring temporary workers.

Strict time management and project management tools can help working mothers work shorter days while maintaining productivity. This is a major concern as frequently people who used to work long days now need to shorten the amount of time spent at work without reducing the amount of work being done. The presenters have found that smart phones have been a major boon in their ability to allow workers to stay in touch with coworkers and family and get work done wherever they are.

Presentation 13: Kevin S. Clarke - Docker? VMs? EC2? Yes! With Packer.io

Packer.io is a tool that can be used to build computer images. To use it you create a template designed to work with Packer.io and then it uses a virtual machine builder (Docker, VirtualBox, VMWare are examples) and a provisioner (Ansible and Bash are examples) to assemble and configure an image.

It and the tools that it works with are designed to work with Linux and other POSIX operating systems. Using it you can quickly install or reinstall a server configuration. Normally if you set up a computer with a complicated piece of software and then you need to rebuild the server this can take a lot of work. Packer.io makes it possible to script out most of this process so that building a new, identical server from scratch can be fast and easy.

This presentation described the basic process involved in doing this and went through a lot of the details. It was quite interesting and was something I think I'd be interested in learning more about.

Presentation 14: Axa Mei Liauw and Kevin Reiss - BYOB: Build Your Own Bootstrap

This was an intriguing presentation about creating a more flexible alternative to Twitter's Bootstrap framework. Bootstrap is an open package of Cascading Style Sheets and Javascript that can be used to quickly assemble a decent looking website. It is quite popular among web programmers who have few design skills (not entirely unlike myself) because it can be used to rapidly put together a responsive website that looks decent without having to worry too much about design particulars.

The presenters pointed out a few issues that they had with Bootstrap. Bootstrap markup can be a little dense and heavy, and their BYOB alternative is cleaner in this regard. Bootstrap can't be made accessible (easy to use for those using screen readers and coping with other disabilities) without the use of a plugin. Also if a designer does get involved at any point, Bootstrap can be a frustrating environment (the presenters described it as the "Times New Roman" of the web).

To accomplish their BYOB solution, they assembled a variety of web packages that can work nicely together into a single solution. First they "decoupled" the HTML and CSS in their design. This basically means that they made sure that the HTML and CSS used to style it could stand alone.

Having done this, they used a CSS templating tool called Sass to create new CSS. The specific brand of Sass they used was SCSS, which looks largely like CSS, making it easier to maintain and less of a conceptual leap.

On top of this they used a Sass library, called Bourbon, an alternative grid system called Singularity, a Sass add-on called Breakpoint for loading in images and other media, and a set of default styles from a package called Bitters.

This combination makes really quick to edit code that provides a great deal of flexibility, full accessibility, and clean, easy-to-read markup that doesn't have a great deal of redundancy.

This was a really interesting overview of a variety of powerful tools. I could have done with a much longer time to cover them in detail (everything felt a little rushed) but it gives me something to research on my own.

Lightning Talks II

Rights Metadata - This presentation described issues that the presenter was having with providing custom access rights on a system using a repository solution called Hydra. The presentation described their process to create a custom solution that would address these issues.

Code4Lib Japan - The presenter described the 2014 regional Code4Lib conference held in Sabae, Fukui, Japan the many open initiatives and projects that were discussed and worked on at that conference.

Arduino as a Learning Platform - This presentation, made by Domenic Bordelon of East Baton Rouge Library, was quite interesting to me. Bordelon described the Arduino classes they have had there as a method of teaching programming. As I have taught an Arduino class at my library and am planning more, I was quite interested in hearing what he had to say. After his lightning talk I met up with him and shared some information. He gave me some good ideas which I'm hoping to mix into the classes we are already offering.

PreForma Project - This presentation had something to do with processing of files to go into archives.

PBCore RDF Ontology Hackathon - This hackathon occurred right before the conference. PBCore is the Public Broadcasting Metadata Framework and RDF is the Resource Description Framework. This presentation was largely over my head.

RDF, Fedora, and ActiveFedora for Relational Heads - Similar to the previous presentation, this tackled the use of RDF in the Fedora repository system.

Building a Bibframe Catalog - The presenters gave an overview of their work in implementing a new web-based catalog for the Library of Congress using the Library of Congress's new Bibliographic Framework Intiative known as BibFrame.

Murkurtu CMS - The presenters here described an open source content management system called Murkurtu that was designed for hosting digital collections serving indigenous peoples in North America.

How Do We Become Better Developers . This lightning talk looked broadly at the subject of becoming a better software developer. The presenter's key points were: developers should work in groups of at least two as it is important to work off of someone else, buying training materials, read The Pragmatic Programmer by Andrew Hunt and David Thomas, and review one another's code.

Drupal, Git and Sanity - This presenter described the process of managing the complicated code in the Drupal-based sites managed by their institution by loading it into Git. That Git repository then can serve as a basic model for all websites. If they need to bring up a new website they can just do a pull on the Git repository and that will load in all of the current necessary files.

Presentation 15: Jessie Keck and Jack Reed - Making Your Digital Objects Embeddable around the Web

The presenters in this talk had a lot of digital objects and they kept recreating the wheel of the image viewer. They generally haven't always been happy with their approach or their solutions. They decided to use existing providers for the distribution of their materials (Hulu, Flickr, Twitter, YouTube, Instagram, Slideshare, Speakerdeck) and then use oEmbed to bring the resources hosted on those different services into their website.

Using oEmbed they were not only able to find a way to make it easy for developers to easily embed multimedia from different sites on their website, they were able to implement an oEmbed solution on their end so that others could embed content from their site into other sites.

Presentation 16: Naomi Dushay and Laney McGlohon - Digital Content Integrated with ILS Data for User Discovery : Lessons Learned

The presenters, from Stanford University -- the same institution as the presenters immediately preceding them -- were discussed another aspect of the same problem that the previous presenters had concerned themselves with. In this case they discussed their process of linking to and embedding objects directly from their library catalog.

Presentation 17: Wayne Schneider - Dynamic Indexing: a Tragic Solr Story

I found this presentation interesting insofar as it involved a public library and described how they addressed some interesting problems. It didn't seem particularly relevant as something that I need to do or could do without access I don't currently have.

The library wanted better indexing of their materials including live status information (or as close to it as they could get) about the availability of downloadable materials from within their public facing catalog.

Most public facing catalogs anymore do not just query the main database, but they have their own databases which they query, which is much faster. Unfortunately, because the information is not coming directly from the catalog system itself, it can wind up being out-of-date.

The solution to this problem provides much of the functionality that they would get if the catalog itself was being searched directly, but is actually a separate database (hence the "faking it" part of the name of their presentation). They have a rather complicated workflow by which they get a dump of their Horizon ILS, run it through a variety of processes, and then dump that output into an index for Solr, a common server-loaded search engine.

By automating this process and doing it frequently, they have managed to keep what appears to be a dynamic index of their catalog up-to-date when it is in fact hosted on a separate server.

Presentation 18: Jason Thomale - You Gotta Keep 'em Separated: The Case for "Bento Box" Discovery Interfaces

There were some intriguing aspects to this presentation that I had not expected. The main point was an evaluation of the idea that patrons prefer as single search box with a unified search results list. The presenters argued pretty passionately that users do not like unified results list, but rather like them segregated by type (like a bento box has each kind of food in its own compartment).

What was more surprising than the end conclusion was the way that they came to that conclusion. Using the Google Analytics API (and I had never realized that Google Analytics had an API -- although in retrospect I guess this shouldn't surprise me) they collected two years of achingly detailed data on what users did with search results on their website (when did they click on facets, when did they put in limits, when did they do follow-up searches, what kind of searches resulted in what kind of clicks in each segregated compartment of their bento box interface, etc.)

They did some of the same kinds of things that I've seen done in evaluation of interfaces before, but by sucking data directly out of the Google Analytics API and then parsing that data using their own code to find specific patterns was a new thing for me, and quite impressive.

Presentation 19: Jason Casden and Bret Davidson - Beyond Open Source

The presenters here opened up with a brief history of the open source movement. They made the point that "Libtech" (a term used throughout the conference to refer to technology used for library purposes) should be able to thrive in diverse environments.

They argued that libraries have done a good job about creating a collaborative environment, but it favors those who have resources already. The presenters felt that something needs to be done to try and help libraries with more limited resources implement technologies.

Although creating better installers might be able to be used to help and the use of hosted and managed services might help, virtualization technology probably has the most promise in bringing new services to libraries. Virtual environments and virtual containers, solutions created with software such as Vagrant and Docker, hold particular promise for being able to fill this role. They argued that it is important to make open source software easier to install, maintain and evaluate than it currently is.

The presenters have been working on a reference statistics collection project called Suma which has seen relatively light adoption in more limited environments (including public libraries). They feel that most Suma users have the greatest amount of trouble with the install, and by making a Docker installer for it they hope to make it easier for libraries to try out and use this software.

As it happens I've been looking for a good piece of reference statistics collection software so this project holds a lot of interest for me and I look forward to testing it.

Presentation 20: Matt Connolly and Jennifer Colt - Awesome Pi, LOL!

This was a cute presentation describing how the presenters made and implemented an Awesomebox (Awesomebox.io) using a Raspberry Pi. The Awesomebox is a cool thing that I learned of last year at Computers in Libraries 2014. It was developed at Harvard with the idea that patrons could easily recommend materials just by scanning the barcode, and then items could be recommended as being "awesome".

Seeing it done with a Raspberry Pi and a generally limited resource set makes me almost interested in implementing it, if I had an idea of where to put it.

Presentation 21: Rebecca Fraimow and Casey Davis - American (Archives) Horror Story: LTO Failure and Data Loss

120 public radio and television stations had lots of analog archival recordings and it needed to be digitized to be sent to the Library of Congress for public access. The presenters, who work for WGBH, worked on trying to get this done.

This was an interesting presentation, largely because the whole thing has kind of been a mess. For whatever reason the digital copies of many of the files have had an extremely high failure rate in getting re-encoded for sending them to the Library of Congress.

They initially had a failure rate of 57% in their digital conversion process. They wound up having to develop a process for identifying processes that were failing and automatically rerunning them. Even after that they are still getting a 20% failure rate and they aren't sure why. It could be that the files are corrupt or it could be a problem with the files that they need to develop a specialized program to work around.

Presentation 22: Rob Sanderson and Naomi Dushay - Annotations as Linked Data with Fedora 4 and Triannon (a Real Use Case for RDF!)

This was a third presentation from Stanford University on the embedding/linking of objects, in this case they were focused on maps. They used an objects model called JSON-LD (JSON being the storage model used with JavaScript that has become a defacto standard in many circles) which apparently makes a structure that is directly analogous to RDF using JSON.

Presentation 23: Kathryn Stine and Stephanie Collett - Consuming Big Linked Open Data in Practice : Authority Shifts and Identifier Drift

The presenters here had wanted to synchronize their database with information in the VIAF (Virtual International Authority File). They had a really large database (9.5 million records) and as OCLC (the owner of the VIAF) has a hard limit on the number of queries they could do against the VIAF on a daily basis, they had to spread out their process of checking those 9.5 million records over a period of weeks. The problem is that the VIAF is constantly being updated and their database was constantly being updated so they had a huge time with data drift. They didn't really have a great solution to this problem.

Presentation 24: Audrey Altman, Gretchen Gueguen, and Mark Breedlove - Heiðrún : DPLA's Metadata Harvesting, Mapping and Enhancement System

This presentation was made by three people from the Digital Public Library of America (DPLA). DPLA stores 8.4 million records, from 23 hubs and 1350+ contributing institutions. The data they receive is sent using nine different schemas, not all of them XML-based.

DPLA had a complex and problematic process for sucking in data and processing it. To ease things they developed a new system called Heiðrún , which was named after a goat in Norse mythology who consumes leaves and produces mead (MEtAData). The presenters filled their presentation with many allusions to goats which made it pretty fun. It was a detailed description of the way their complicated system works.

Friday, February 13, 2015

Code4Lib 2015 - Day 1 Presentations

Code4Lib has a different structure than most other conferences I have been to. Rather than multiple tracks it has a single track and that track is densely packed. On the first day the common track had 10 20-minute presentations and a lightning round of another 10 five-minute presentations for a total of 20 presentations. Previously I've tried doing a significant write-up for every presentation, but this is too overwhelming with a conference like this, so instead I'm going to have a single, long post that broadly covers the activity of the day, pointing out presentations that I particularly enjoyed or found useful in highlighted text.

Presentation 1: Becky Yoose (Grinnell College) - Your Code Does Not Exist in a Vacuum

This presentation began with a brief comparison of the different ways people think technology and society relate. The first view is technological determinism, in which technological innovation drives societal change (e.g. the printing press was a technology that resulted in the Reformation). The second view is technological constructivism, in which society drives what technologies are developed and adopted. This viewpoint strives to explain why sometimes revolutionary technologies are not adopted or fail to bring about changes in society.

The third relationship between technology and society Ms. Yoose presented is technological somnambulism. In this view, there is no driving force but both things seems to stumble forward like a sleepwalker, unaware of the full impact one is having on the other.

Taking up this view as the likely more correct one, Ms. Yoose went on to describe the potentially negative effects on library culture that the adoption of concepts from the world of programming have on it. The first concept taken on in this way was that of the Fast Fail, where many ideas are tried and you quickly decide to move on when the idea fails. Yoose pointed out that there are many costs to this model, frequently costs that institutions with limited resources may have trouble absorbing.

The second fad from the world of programming is from the world of open source where there is a hierarchy of coding circles with core contributors at the center and regular users at the outside. This puts the skills of coders as the most valued skill with other skills and roles in a project as secondary, and end users as leeches sucking value from the work of others. When it comes to cultures such as libraries, this is a potentially damaging way of looking at things and a view that should be questioned, modified, and/or discarded.

Yoose closed with these questions:

As we make things work, what kind of world are we making?
Does the world we are making match the world we want to make?

This was a rapid, dense, and challenging talk, but overall a good and thought provoking one.

Presentation 2: Sibyl Schaefer - Designing and Leading a Kick-a** Tech Team

Ms. Schaefer described the situation at institution where she was charged with take control of the "D-Team", a department of technology professionals. She started out by asking staff outside of the D-Team about attitudes, values and priorities. the D-Team was viewed as an insular group and there was considerable anxiety about technology and change. She set goals for the team as providing access, custody of resources, and professionalism.

Using the calculation of "goals + values = consistency" it was important then to establish values to provide consistent service to the staff. The team's values were set as: service oriented/user centered, a need to iterate to a great product, being in the picture for the long haul, and an ability to continuously learn.

Ms. Schaefer then provided the following recommendations for assembling and managing a team:

When hiring - figure out what you need. Look for authenticity, values, curiosity, tech ability, archival background.
Balance making and managing.
In delegation - choose the right person for the task, agree on expectations, stay engaged, create accountability and learning.
Get and provide feedback.
In communication - Individual weekly check-ins, weekly reports, monthly reports are important

Presentation 3: Erin White - Programmers Are Not Projects

Erin White followed Sibyl Schaefer with another description of managing programmers in a team. She emphasized the need to balance hard skills, easily defined skills that require easily defined knowledge of some kind, with soft skills, those for managing people that are based on context and the people with whom you work. She then went over a number of examples of soft skills and the way you use them to manage a team that works.

Presentation 4: Coral Sheldon-Hess - Leveling up Your Code with Code Club

Coral Sheldon-Hess described her experience with getting together with others to improve computer programming skills. This was an interesting concept that I think might possibly have some use in some context or another in my life, but I'm not entirely sure where yet.

Code Club is a small social group, in Sheldon-Hess' example all women, who get together on a regular basis to go over some sample code. They can find sample computer code in any number of places and a different person is responsible for running the group each week. They spend an hour going over a bit of code and working out together what the code does and why it was assembled the way it was.

In her case she described a group that met using Google Hangouts, scheduled their meeting time using Doodle, and was in the 4-7 person range.

It sounded like a good way for people to learn how to think about writing better computer code by studying what others have done to solve a problem.

Presentation 5: Bill Levay - A Semantic Makeover for CMS Data

This was an interesting presentation made by a graduate student. He described the project he worked on in which Javascript, SQL and Python were used to tag photographs of Jazz musicians so that they could be searched by the people in the photos, when the photos were taken and where the photos were taken.

What was most interesting to me were the new (to me) references of using the DBPedia and GeoNames URIs to link to data in different resources in a programmatic way.

Presentation 6: Jason A. Clark and Scott W. H. Young - Your Chocolate is in My Peanut Butter! Mixing up Content and Presentation Layers to Build Smarter Books in Browsers with RDFa, Schema.org, and Linked Data Topics

The presenters, from the University of Montana, described their work with digitizing books to make a result from a paper book that has more of the advantages of materials normally on the Internet, like hyperlinking to related external resources, full-text searchability, and the ability to link to specific sections within the book easily.

A couple examples of their finished product can be found here:
Home Cooking
Opsis Literary Arts Journal

Presentation 7: Anne Wooton - Helping Google (and Scholars, Researchers, Educators, & the Public) Find Archival Audio

This was an interesting session if only to learn of the site the speaker develops. Anne Wooton is one of the founders of PopUpArchive.com, a site that mainly works with other agencies to create text searchable audio. Through their process, which has required a great deal of effort to refine to the point that it works now, they can get a recording in English and generate a rough transcript of the text. The text is then associated with the locations where it occurs. This allows you to search the audio with regular text and then you are dropped into the recording where the text occurs.

The process also creates what are essentially the audio equivalent of thumbnails, where in a list underneath the player you can see a list of times in the audio and what text is being spoken there.

To do this they are using the search tool Sphinx and the software Kaldi for speech to text. The speech to text tool was trained using oral history and public media sources in English, because that is the kind of audio they are typically processing. At the moment their system doesn't work with languages other than English.

Lightning Talks I

Lightning talks are 5 minute presentations given by attendees that are scheduled during the conference (rather than in advance, like all of the main presentations were). During the conference there were three blocks of lightning talks, with each block having 9 to 10 short presentations. The presentations in this first round of lightning talks (with only ones I was particularly interested in having any extra description beyond the title) were:

Automated Entity Extractions to Relate Library Resources
Open Source Digital Archiving Toolkit, ResCarta - This was kind of interesting if I were to have a need for the product, which I guess I might, eventually. It is a piece of software written in Java that is a kind of one-stop-shop for processing different kinds of files.
Information Design Thoughts - The main thing I got out of this was a book recommendation: The Design of Everyday Things
Vufind & Worldcat Discovery API
Video Accessibility on the Web - This was an informative and entertaining talk demonstrating very quickly how adding text tracks to a variety of video formats can be made easy by the use of a file format called WebVTT. This is important if you want to make video accessible. It is also an easy way to make a karaoke video, which they demonstrated by showing a segment of the music video for Tubthumping by Chumbawamba. This resulting in a considerably large number of people joining in on the chorus, which was pretty much where their demo video cut out.
Teaching Your ILS How to Accept Money for Fines - We are already doing this through our consortium and seeing a discussion of the process made me glad that someone else has taken care of it.
Fedora 4 Migration
LDPath - A demonstration of using this software to extract data about Portland from geonames.
Self-deposit of Scientific Data
Bread (How Baking Bread Made Me a Better Programmer) - Making bread made the prsenter a better programmer. Learning to do this 1.) helped him embrace his fears 2.) learn the components (he reveres Alton Brown's I'm Just Here for More Food) 3.) Patterns are useful 4.) Patterns have to be implemented into a larger context 5.) Learn other styles

Presentation 8: Eben English - Boston Public Library - Book Reader Bingo

This presentation was different than what most people would guess it was about (I would think they would guess e-book readers, like a Kindle) or what I though it might be about (a book scanner). In this case it was an overview of the different products that can be used within a browser to show a book, with a detailed comparison of the different features and drawbacks of each. It was an entertaining and well-done talk which could come in very handy if I ever have a book that I need to have embedded into a site for on-screen reading (and given the fact that's kind of what we do with our newsletter nowadays, this eventuality could happen sooner than I would otherwise anticipate).

Presentation 9: Megan Kudzia and Kate Sears - Leveling Up Your Git Workflow

In this talk, the presenters described their problems with Git (and for the most part they were talking specifically about the most public implementation of Git, GitHub) and how their experience changed the way they thought about the resource. They went on to describe how the changes in their thinking helped them use Git more effectively.

Presentation 10: Terry Brady (Georgetown University) - Got Git? Getting More Out of Your GitHub Repositories

This session had a variety of tips for using GitHub for collaborating with someone on code and for communicating or annotating code that is hosted on GitHub. It tied in very nicely with the presentation before it. I have used GitHub some and it is an important resource for downloading projects in development (largely supplanting for my use the older SourceForge). There were several interesting tidbits, but I'm not sure how soon I would be using them.