Artifacts in the Archives

The following slides and text are from a presentation at the Society of Florida Archivists/Society of Georgia Archivists Joint Annual Meeting in Savannah, GA on October 14, 2016.

The full data-set can be downloaded here.

slide01So with this transition from the grant-funded project to our regular UF operations, I was tasked with creating a processing plan for what remained unprocessed from the museum collection. This included a large assortment of artifacts and artwork, along with your more standard archival documents and photographs. John and I met with the three members of the Panama grant project team and went over the work they had done so far and tried to gain an understanding of their processes and the work that had been completed. While the ways in which processing had been done over the previous 2 years with the project did not meet with how we would have preferred the work to be completed – and somewhat complicated things from our archival viewpoint – it was decided to continue in the same manner to assure that the entire collection got processed and, while it wasn’t ideal, keeping with the earlier practices would at least create fairly consistent control and description of the collection.

slide02I set out to create the processing plan – what actually was my first ever solo processing plan – by surveying the collection holdings at our off-site storage facility where the majority of the unprocessed items and records were held. The results of the survey showed that we had about 7 linear feet of archival documents; over 200 framed art pieces, maps, and similar works; and almost 4,000 artifacts left to process. Now, I’m well-trained in archival processing and come from a long line of MPLP-style work, having received my early hands-on processing training with one of the PACSCL Hidden Collections projects in Philadelphia. I keep stats on my own processing whether administrators request it or not, and I’ve implemented some of the same processes for metrics tracking at UF. So, I was pretty secure in estimating the needs for processing those 7 linear feet of archival records and photographs.

slide03What I wasn’t sure about was how to estimate processing of the art and artifacts. At PACSCL, we dealt with a small number of artifacts and tended to keep them within the archival collections. I also worked with the National Park Service for about a year, but there, artifacts were removed and processed by someone else. I headed to the web, as you do, to look for information on processing times for artifacts, but didn’t coming up with anything of much use. The Park Service has a lot of information on how to budget money for artifact processing, but doesn’t include information about time in their manuals. There was scant information available from other sources, so I ended up making an educated guess and crossed my fingers (in the end, I guessed a bit too low).

slide04But, this made me question – with our love of stats and assessment – why aren’t some general numbers for artifact processing available somewhere.

slide05I posed this question to John and he agreed. He had looked for this type of data before and found very little. He recalled a few times in the past where archivists or other professionals would pose this question to the SAA listserv and noted that they would generally be met with responses noting the unique nature of artifacts and how one couldn’t possibly generalize processing times for artifacts or artwork. Having learned how archivists used to say this all the time about our own paper collections, but knowing that we somehow managed to move on to the understanding that minimal processing usually takes around 4 hours per linear foot and item-level processing tends to take 8 to 10 hours per foot, I thought, we can do better. And with the advice and encouragement of my dear supervisor … a research project was born.

Along with John, I formed a small but professionally-diverse group of people including Lourdes and Jessica, John’s highly knowledgeable wife Laura Nemmers, and a colleague from the Ringling Museum in Sarasota, Jarred Wilson. We started working on a survey to pose to archivists and museum professionals to try to figure out what data people had and how we could aggregate that into a generalized form that would be useful for budgeting and planning future processing projects. As is the focus of our talk here, this issue is becoming more and more common and we all thought these sorts of metrics would prove useful to others in the future.

slide06In our first meeting we spent a lot of time deciding how to collect the data and also discussing terminology. Having a group with mixed archival and museum backgrounds led to discussions of what exactly each of us meant when we said accessioning, processing, inventorying, and other such terms. Where I say process, Jessica may say accession. Where I say minimal record, she may say inventory entry. Further research and discussions showed that even within one segment of the community, these terms didn’t describe the same tasks for everyone. So, we began to think that we should survey people about terminology before surveying them about data – to make sure we asked the right questions.

When we next met, we went over the survey I had devised to try to get a grip on the terminology questions – but it was still confusing and not actually getting at the point we were after. And we also knew that surveys tend to have a small response rate and we didn’t want to over-burden the people that might participate in this project. So, back to the drawing board we went. We decided instead of asking people what they meant by each term and then asking how much time they spent doing the tasks described, we would cut to the chase and describe the actions we meant and see if they had data they could share or if they would agree to collect some data and send it to us.

slide07I sent out a general email asking for people who might be interested in taking part in a research survey regarding artifact processing within archival settings. From that first request, I received 31 responses from people interested in taking part in or learning more about the project. Then, once we had sorted out exactly what to ask for and how to format the data, I sent another, more specific request to just the people that had initially responded. After sending out that request, a number of people dropped out, and in the end only 6 people submitted data.

slide08But within those 6 institutions (7 when we include UF) were a wide variety of institution- and record-types – including archivists, curators, and managers from academic institutions, museums, federal and city government, and public libraries.

slide09As for the data, we had devised a set of 9 categories of artifacts that grouped different sorts of items together based on size or complexity, and a general idea of how long they would take to describe. Of the institutions who participated, 4 used these categories to collect data, while the other 2 sent in more generalized information from how they normally collect or devise processing times. At UF, we did a bit more processing of the artifacts with these categories in mind since metrics were not collected during the first 2 years of the project and having our own data involved seemed like a good idea.

slide10Here you can see the data parsed out by the categories showing the average amount of time for either minimal or full processing for the assorted 9 categories. The entries marked “null” meant that no data was received in that category for that level of processing. (And you may notice that one outlier in category 3 where each item took almost 3 hours to process. Those were some pretty intense dioramas that skewed the data wildly for that category, but it doesn’t have much of an impact on the final averages.)

slide11 Here you can see the average overall processing times in a few different ways. Processing time for the categorized items comes in at around 8 minutes per item for minimal processing and almost 22 minutes per item for full processing. All of the categorized processing averages out to just over 19 minutes per item. When I couple this data with the numbers from the other 2 institutions that only sent in generalized data, you can see that the final number only goes up by about 20 seconds in the end. So what we have in the end is that, with or without the categories, an artifact can generally be expected to take roughly 20 minutes to process (I had estimated 10 minutes in my processing plan). This is an aggregate, so obviously the processing times of individual items will vary dramatically. But for large collections of objects, knowing that you have, say, 2,000 or so items to process, at roughly 20 minutes per item, allows an institution to at least propose a relatively reliable timeline (5 to 6 months) for project planning and budgeting.

I would like to see a larger data-set to create more useful guidelines for processors going forward, and we’re continuing to collect numbers at UF, but for now, this is what we have. Also, just a quick thanks to everyone who participated in this study.

slide12

Code4Lib2016 Conference Review

This post was written for, and first appeared on, SAA’s SNAP roundtable blog.

Code4Lib 2016 was held in Philadelphia, PA from March 7 to 10 along with a day of pre-conference workshops. The core Code4Lib community consists of “developers and technologists for libraries, museums, and archives who have a strong commitment to open technologies,” but they are quite open and welcoming to any tangentially related person or institution. As a processing archivist whose main experience has been with paper documents, I thought I would feel confused and out of place for the length of this conference, but, while I had my moments, I left feeling more knowledgeable about efforts and innovations within the coding community, giddy with ideas of projects to bring to my own workplace, and incredibly glad that I stepped outside of my archival comfort zone to attend (and present at!) this conference. (And I have to thank our university’s Metadata Librarian, Allison Jai O’Dell, for asking me to present with her. Without her reaching out to me, I likely wouldn’t have gotten involved in the conference to begin with.)

So, before Code4Lib, there was Code4Arc – at least, as a preconference workshop. Code4Arc focused on the specific coding and technology needs of the archivist community and on the need to make Code4Arc an actual thing, rather than just an attachment to Code4Lib. While both communities would have quite a bit of overlap, archivists obviously have their own niche problems, and coders can often help sort those problems out. Also, having a direct line between consumer-with-a-problem and developer-with-a-solution would prove quite beneficial to all parties involved. The day was divided up into a series of informal discussions and more focused breakout groups, along with some updates from developers. The end result mainly boiled down to continuing the discussion about our needs as a community, communicating and sharing knowledge and data more openly, and focusing efforts on specific problems that affect many archives. We’ve formed some ad hoc groups and will likely have more to say in the not-too-distant future.

code-loveAs to the conference proper, I’ll start by noting that a ton of information is available online. The conference site lists presentations, presenter bios, and links to twitter handles and slides where available. Three series of Lightning Talks emerged during the conference; information on those talks can be found on the wiki, which is full of useful information and links. And everything was recorded, so you can watch the presentations from the Code4Lib YouTube channel. The conference presentations were almost a series of lightning talks themselves. Each presentation was allotted 10-20 minutes of time, with 6 groups of presentations given over the course of the conference, along with 2 plenary talks. So, while it was a nice change from the general conference configuration, it did make for a rather exhausting (but engaging) experience. Having said that, I will only specifically mention a few of the presentations that resounded more with me or relate more specifically to archival work (because seriously, I saw over 50 in the course of 2.5 days). But again, I stress, totally worth it! And they feed you. A lot!

So on day one (inserts shameless plug), Allison Jai O’Dell and I presented The Fancy Finding Aid (video | slides). We talked about some front-end design solutions for making finding aids more interactive and attractive. Allison is wicked smart and also offered up a quick lightning talk on day three about the importance of communicating, often informally, with your co-workers (video). Other presentations of note from day one include Shira Peltzman, Alice Sara Prael, and Julie Swierzek speaking about digital preservation in the real world in two separate presentations, “Good Enough” preservation (video | slides) and Preservation 101 (video | slides). Eka Grguric broke down some simple steps anyone can take towards Usability Testing (video | slides) and Katherine Lynch shared great ideas regarding Web Accessibility issues (video | slides). Check out the slides for lots of great links and starting points, like testing out navigability by displacing your mouse or using a screen reader with your monitor off.

Matienzo: Ever to Excel

Later on, Mark Matienzo discussed the ubiquity of the spreadsheet in Ever to Excel (video | slides). The popularity of spreadsheets may come from the hidden framework that shields users from low-level programming, making users feel more empowered. Lightning talks included the programming committee asking for help with diversity in #ProgramSoWhite (video | slides), a focus repeated the following day in a diversity breakout session. Ideas generated from the diversity talks were focused on further outreach with schools and professional organizations, scholarship initiatives for underrepresented populations and newer professionals, and stressing the need for those in the coding community to reach out for collaborators in other areas to bring new voices into the community.

Angela Galvan, in her talk titled “So you’re going to die” (video | related notes), spoke about digital estate management and the need to plan for what happens to digital assets after someone dies. Though humans now post so much of their lives online, we are still relatively silent about death. Yuka Egusa’s talk about how non-coders can contribute to open source software projects was particularly popular (video | slides). She notes that engineers love coding, but generally don’t like writing documentation. Librarians and archivists can write those documents and training manuals, and we can aid with reporting bugs and usability testing. Don’t let lack of coding knowledge keep you from being part of innovative programs that interest you.

Yoose: libtech burnoutDay two: Becky Yoose gave an exhilarating talk about protecting yourself from #libtech burnout (video | slides). In the lightning talks, Greg Wiedeman spoke about his Archives Network Transfer System (video | more info), which is an interesting solution to a problem Code4Arc focused on, but also highlights the need for a simpler way to structure the process of transferring digital materials to the archives.

Andreas Orphanides gave a great talk about the power of design. Architecture is Politics (video | slides) highlighted how, intentionally or not, your web and systems designs are political; likewise politics influence your design. The choices you make in design can control your user, both explicitly and subtly, and politics can influence the choices you make in the same way. Thus, design is a social justice issue and you need to be active in knowing your users, recognizing your own biases, and diversifying your practices. Matt Carruthers talked about Utilizing digital scholarship to foster new research in Special Collections (video | slides). This project at the University of Michigan provides visualization-on-demand customized to a patron’s research question. Though still in the early stages of development, they are extracting data from EAD files they already have to create EAC-CPF connections. This data is then used to visualize the networks, and online access to the visualizations is offered for users. This is the start of a fascinating new way to provide further discovery and access in archives and special collections.

Day three’s lightning talks included Sean Aery from Duke speaking about integration of digital collections and findings aids and some great ways to maintain context while doing so (video | slides); Heidi Tebbe recommended the use of GitHub as a knowledge base, not just a place for code (video | slides); and Steelsen Smith pointed out the various issues that can arise with assorted sign-ons and using single sign-ons to actually open up systems for more users (video | slides).

And lastly, Mike Shallcross discussed a University of Michigan project that I’ve been following closely, the ArchivesSpace-Archivematica-DSpace Workflow Integration (video | slides). They are working to overhaul archival management to bring ArchivesSpace and Archivematica together with a DSpace repository to standardize description and create a “curation ecosystem.” We’re closing in on a similar project where I work and Mike has been making regular (and rather entertaining) blogposts about the Michigan project, so it was good to hear him in person. (If interested in more, check out their blog.)

Orphanides: Architecture is PoliticsOh, the plenary talks. I almost forgot. They were great. The opening talk by Kate Krauss of the Tor project focused on social justice movements in the age of online surveillance (video | slides) and the closing talk by DuckDuckGo founder Gabriel Weinberg (video) similarly focused on privacy and related concerns in online searching.

So, it was a great conference. There were definite themes emerging about creating better access and more privacy for users; trying to get out of your normal routine and envision projects from another perspective; communicating better and more openly within and around our own community; and using all of this to better document and support underrepresented communities around the world. I’ve now said too much. I hate reading long blog posts. But I definitely recommend this conference to anyone in the library and archives fields with any inkling of interest in digital projects. It’s a great way to get new ideas, see that you aren’t alone with your out-of-date systems, and meet some great people who you may not normally get to interact with on a regular basis.

code4lib 2016: The Fancy Finding Aid

Fancy Finding Aid – PowerPoint slides (PDF format, 1 MB)Fancy Finding Aid PowerPoint files

Video of presentation:

Reaction on twitter: