Step Two: Process All the Data

Where was I? Oh yes, in my last processing-related post, I had just finished the physical processing of my collection. I’ve now finished all of the folder title data entry and am in the final stages of editing scope notes and other descriptive text. (Side note: I probably need a “Step One-Point-Five: Arrange All the Records” post, but frankly, it’s not that intriguing. “Put things in order.” Done! Back to step two.)

Excel and the joys of XML (which I hardly understand myself) saved me a bunch of time in this process. This collection has 534 boxes (plus some flat files and records not physically in my possession), with 4,631 folder titles. And I spent about 50 hours typing them all into the computer. So far, I’ve also spent about 35 hours editing that data and the scope notes for the collection.

The actual final (post-arrangement and error checking) numbers on the collection size are this:

  • Original collection size: 220 linear feet
  • Final collection size: 184.38 linear feet (or 180.48 cubic feet) (about a 16% reduction in physical size)
Wrangell-St. Elias National park and Preserve
awaiting labels

Processing speed depends on how you look at it. Based on the original size of the collection (which is how I’ve always done this), I’m at just over 3 hours per linear foot. Very speedy. Based on the final size, I’m at 3.50 hours per linear foot or 3.66 hours per cubic foot. Still pretty speedy. Data entry averaged out to a bit over 3.5 feet per hour. And now I’m left with writing and editing narrative text for the finding aid and putting labels on things (post-its are not exactly kosher in archives-land), which is great because . . .

I’m moving to Florida in 2 weeks. I’ve accepted the position of “Processing Archivist” at the University of Florida and I’m very happy to be moving back close to home and taking a job which sounds interesting and has no attached end-date. I’ll be processing across all of their collections, so stay tuned for further tales of processing and, hopefully, humorous finds via Instagram/Twitter (see buttons above). While Alaska has been a unique experience, I’m overjoyed for this new opportunity and happy to be ahead of schedule and finishing up this project before leaving.

Step One: Process All the Papers

Today I reached a small(-ish) but significant milestone: I have finished the physical processing portion of my project. Up here in Anchorage, I’m working to process and describe all the records of the Wrangell-St. Elias National Park and Preserve (the largest National Park in America, I might add). Now I just have to do some arranging and a ton of data entry and write some descriptive text and, voilà, done!

post-processing, but pre-arrangement
the records after processing, but before final arrangement

So – numbers. Everyone enjoys numbers. I’ve been clipping along pretty quickly, doing mainly ‘minimal processing’-inspired work (with preservation aspects thrown in for various media formats). When I first arrived, I was presented with about 100 linear feet of materials. A few months later, another 120 or so linear feet were added, making up pretty much the entire archival collection of park records (I later learned that there are some boxes scattered about the park office still, but it’s winter here and the office is hundreds of miles away, so those records will likely be added to this collection a bit later, perhaps in spring).

Sharpening an axe, undated image
culling collections like a skilled lumberjack (from the collection, undated)

All-told, the collection came to 223 linear feet (pre-processing). After processing, it is down to 176 linear feet. This will likely change as the final arrangement comes into play, but that’s where it stands at the moment. So – where did all that stuff go anyway? Did I magically get rid of over 47 linear feet of material? Well, no. As much as I love to throw things away, the vast majority of that decrease is simply due to housing. Of the 47 linear feet, I’m responsible for removing just over 8. This material was either out of scope or redundant. (A funny thing happens when you amass decades’ worth of small collections that aren’t cross-checked with each other – you get a lot of the same stuff repeating itself.) Anywho, the remaining 39 linear feet came from removing notebooks; providing better housing for photographs, slides, and assorted video formats; and by simply not leaving excess empty space in the back of boxes.

Additionally, from tracking the time spent on various tasks, I know my current processing rate is at about 2.5 linear feet per hour. Obviously this will increase as I continue with arrangement and description, but it’s good and I’m happy with the progress I’ve made so far. Coming fairly fresh from university and rather collaborative processing environments, it’s been quite the experience being largely on my own here and learning to trust my instincts and training more. However, it’s definitely been a great thing to have that support network of friends and archivists to reach out to for advice. Now, on to those 40-50 pound boxes I keep reading about.

Jump In Too survey report

This post consists of the final report for the Society of American Archivists’ Manuscript Repositories Section’s Jump In Too 2014 project. More information and the original files are available on the project website.

floppy disk image
Window to Your Future, Drexel University, 1990

The Drexel University Archives and Special Collections houses roughly 2,000 linear feet of records that document the founding, growth, and functions of the University. The Archives also serves as an educational resource, encouraging the use of the records for researchers and the general public. Prior to the addition of a Records Management Archivist to the staff in 2011, the Archives did not consistently describe digital holdings in accession records. After 2011, accession records include basic description of electronic records and digital media. The Jump In, Too/Two project marks the start of a longer-term project to improve preservation of and access to digital files accessioned by the Archives.

We chose to survey the entire holdings of the Archives during this project. The survey process began by looking at accession records in Archivists’ Toolkit (AT) to gain an overall picture of the holdings. After this review, we decided to only examine materials originating after 1990. We believe that little digital material would be found in pre-1990 collections. Additionally, the resources required to preserve digital material from this long ago would likely prove cost-prohibitive to the archives. An exception was made for the papers and collections related to the Microcomputing Program of the 1980s, which required all Drexel students to purchase Macintosh computers and resulted in the creation of original software; these collections were more likely to contain digital storage materials and are also of high research value.

The survey was conducted by Drexel University Records Management Intern Steven Duckworth. He created an online survey to capture the required information and identify the collections that would be surveyed. This survey worked well as anyone with an internet connected device (computer, iPad, smartphone, etc.) could easily fill in the requested information. This allowed additional Archives staff to enter information on materials discovered when processing their collections and avoided redundancy in multiple passes through collections. This data was automatically saved in a Google Docs spreadsheet that was later exported as an Excel file for ease in analysis. The Records Management Archivist provided guidance throughout the process.

The entire survey process took about 120 hours over the course of two months. The initial stages of background research, survey creation, and collection designation took roughly 50-60 hours. The bulk of the physical survey of collections took another 50-60 hours, with additional materials being added as they are discovered.

Overall, we found more data and a wider number and variety of formats than anticipated. The variety of digital tapes was the most surprising and also adds significantly to the total size of the materials. All told, approximately 6.89 TB of data was discovered through the process. Formats discovered and quantities of each are listed in the chart below.

Media Type Quantity Found
3½” disk 202
5¼” disk 1
Zip disk 30
CD 1,553
DVD 476
Mini (“pocket”) CD 44
USB flash drive 5
Digital Audio Tape (DAT) 1
Digital Video Tapes (5 different formats) 108


The vast majority of the media found were labeled in some fashion and many are filed with related material (e.g., proof sheets for photographs, directory printouts, published materials). However, some of the label annotations will likely prove too vague to be of much assistance for additional metadata. The materials date from 1986 to 2013.

One challenge discovered was noting the annotations written on each physical item. Due to time constraints and the proposed next-steps of the project, we decided that the notations of every label were not needed. This allowed the collections to be surveyed more quickly (e.g., a box with 219 CDs and 29 DVDs of student theses and dissertations was listed as two entries, rather than 248, by only noting that the majority of discs contained the student’s name and occasionally a paper title or file format designation). Specific annotations and metadata will be added during further phases of the project. Additionally, the specific location of digital media was noted in the survey, but no items were separated from their collections during this phase.

One takeaway from the project is the need to accurately document digital media at the moment of transfer. The current archivists have made advances in this area and newer collections are more accurately described. Another finding of note is the sheer volume of data in digital video formats. This is an important aspect to bear in mind for other digital initiatives. The future of this project will include transferring files to our server, processing collections, and, eventually, providing access to many digital files through Drexel’s institutional repository. We are currently working out guidelines to prioritize the order in which items will be processed as this project continues.

Processing After PACSCL

So, here I am in Anchorage, Alaska. It’s been roughly three months since I arrived and I’m slowly settling in. While my social life has all but disappeared here, my work life has been moving along at speeds apparently unanticipated. When I accepted the position, the plan was to process about 500 linear feet of records pertaining to the nation’s largest national park, Wrangell-St. Elias National Park and Preserve. When I arrived, I was shown about 100 feet of records to process, with the rest of the collection remaining in the park’s archives at a far off location (deemed unnecessary for this current project due to having been previously processed). Having now physically processed all of the boxes, removing unnecessarily duplicated documents, bulky binders and spiral bindings, and excessive amounts of empty space, I’m left with just over 76 linear feet of records. 500 feet down to 76 is amazingly reductive. And I still have (a proposed) 9 months left here.

Luckily, a plan was put in place in anticipation of this. After initially surveying the collection in early June, I spoke to my supervisor and let her know that at the speed I’ve been trained to process (4 hours per linear foot, thank you very much PACSCL), it should take only about 3 to 4 months to complete the records on the shelves. Even factoring in the government’s love of meetings, I’ve managed to remain under 3 hours per foot, so far (granted, I still need to do some writing and data entry, but the rough stuff is done). She decided that we would head back to the park (next week!) and get the rest of their records so that I can incorporate those into the new collection arrangement and make one, hopefully coherent, collection of all of the park’s records. Due to the Park Service’s penchant for item level cataloging, we’re not exactly sure how much is left – somewhere between 100 and 200 linear feet. And, there is another project in the works after that (depending on funding and my availability).

The point here is – plan ahead. Especially if you come from a fast-paced, minimal processing background, the archival world you are entering will more than likely expect you to move slower than you do. Former PACSCL project processors have found this to be overwhelmingly the case. Keep your supervisor informed. Don’t try to hide the fact that you are efficient and skilled. Work together to plan ahead. You’ll avoid sitting around with little to do, your employer will (I hope) be happy to get more accomplished than s/he had anticipated, and you may also prove that you are worth keeping on for a longer period than originally planned.

Reprocessing: The Trials and Tribulations of Previously Processed Collections

from the poster presented at the Society of American Archivists Annual Meeting, August 2014, Washington, D.C.

by Annalise Berdini, Steven Duckworth, Jessica Hoffman, Alina Josan, Amanda Mita, & Evan Peugh; Philadelphia Area Consortium of Special Collections Libraries (PACSCL)


PACSCL’s current project, “Uncovering Philadelphia’s Past: A Regional Solution to Revealing Hidden Collections,” will process 46 high research value collections, totaling 1,539 linear feet, from 16 Philadelphia-area institutions that document life in the region. Since the start of processing in October 2013, the team has completed 31 collections at 13 repositories, totaling over 1,225 linear feet. Plans have evolved over the course of the project due to previous processing in many collections. As the processing teams tackled the collections, the solutions devised for the various challenges they encountered developed into a helpful body of information regarding minimal processing. Future archivists and collaborators can use this knowledge to choose appropriate collections for minimal processing projects, and be prepared to handle unexpected challenges as they arise.


  • Novice Archivists: Volunteers and novice archivists, while well meaning, can make simple mistakes that lead to larger problems.
    • Learn about the previous processors; their background and level of knowledge with the materials. Having a better idea of their relationship to the collection helps guide decisions in the new iteration of processing.
    • “Miscellaneous.” It is a very popular word, even with seasoned archivists. Attempts should be made to more accurately describe the contents of a folder, such as “Assorted records” or “Correspondence, assorted,” followed by examples of record types or 1 to 3 names of individuals represented.
  • Losing Original Order: Processors with good intentions can disrupt original order through poor arrangement, item-level processing, and removing items for exhibits or other purposes.
    • Use what original order remains to influence arrangement in a way that might bring separated records back together.
    • Lone items may require more detailed description to provide links back to other documents.
    • Be aware of handwriting: Previous folder titling can serve as a clue for separated items and original order.
  • Item-Level Description: Item-level description can render the collection’s original order impossible to discern and greatly diminish access.
    • Gain a broad perspective of the collection in order to determine the most intelligible arrangement of materials with an awareness of grouping like with like.
    • For item-level reference materials, such as newspaper and magazine clippings, merge materials into larger subject files and include a rough date span.
    • Be cautious when merging other records, such as correspondence. Arrange materials into a loose chronological order and include in the folder title the names of recurring correspondents, if possible.
    • Make sure to account for the new arrangement in one’s arrangement note. Reuniting item-level materials and describing those materials to the new level of arrangement will greatly enhance access to the collection.
  • Legacy Finding Aids: It can be difficult to tell how accurate an existing finding aid is, and the decisions made on how much of it to preserve can be complicated.
    • Again, knowledge of the previous processors’ education and history with the collection will prove helpful.
    • Consider the fate of the legacy finding aid. If the collection will be entirely reprocessed, is anything in the legacy finding aid worth keeping? Should the old and new simply be linked or should parts of the old finding aid be incorporated into the new one?
    • Proofread! Anything retained from a legacy finding aid should be proofread very carefully.
    • Keep ideas of continuity in mind while creating new folder titles and dates.
    • Format can be a problem. Will the format (e.g., hardcopy only) prove problematic for import? Scanning and OCR can be a time-consuming process.
  • Collection Size and Type: Size and type of collection can have a drastic impact on processing speeds.
    • If possible, choose larger collections to economize on time and money. Multiple smaller collections require more effort than one larger one.
    • Institutional records average a faster processing speed than family or personal papers. Keep this in mind when choosing which collections to process.


  • Work closely with current staff; understand the history of the collection and the desired shape of its future.
  • Learn about previous processors to understand their training, background, and history with the records.
  • Edit and expand upon non-descriptive terms (e.g., miscellaneous) when possible. More detailed descriptions can assist in linking separated records back together.
  • Merge clippings and reference files together when feasible.
  • Make note of reprocessing decisions in the finding aid.
  • Proofread any reused documents or folder titles, keeping ideas of consistency in mind.
  • Be mindful of donor relationships in discussing past problems, especially in any public forum, such as a project blog.
  • Plan carefully from the outset. If possible, choose collections that best fit the project goals.
  • Remain flexible and be prepared to compromise.


"Reprocessing" poster for Society of American Archivists 2014 Annual Meeting
Poster for Society of American Archivists 2014 Annual Meeting
Processing speed by collection size graph
Average processing speed by collection size
Processing speed by collection type graph
Average processing speed by collection type

Learn more about the project at

On Collaboration

This post originally appeared on the PACSCL project blog.

The PACSCL Hidden Collections project involves a great deal of collaboration. We work with a processing partner each day. We exchange ideas and stories with the other processing teams. And we work with our project manager and the archivists and other staff at whichever repository we’re currently located. And lately I’ve been thinking a lot about this (mainly due to the work environment where I’m was processing).

I am, quite frankly, frequently surprised at how much I enjoy all of this collaboration. For many years now, my ‘job’ hasn’t been something I truly enjoy. And due to that, I’d forgotten how that feels and had fallen into the stereotypical thought pattern of disliking ‘teamwork’ or group projects. Both of these terms had come to be associated with projects I never had much interest in or working with people I didn’t really connect with. Having been with PACSCL for 6 months now and ruminating on this idea of collaboration – and how I don’t hate it – it suddenly dawned on me that I didn’t used to think negatively of teamwork.

I have been a musician (a cellist) for almost 25 years. And one of the things I most love playing is chamber music. Though I never thought about it in this way before, being in a chamber group is an ultimate form of collaboration. Musicians know there is never one right answer – though there can often be wrong answers – and we work together to bring about the best final outcome. We combine our knowledge of our instruments, the composer, music and world history, and performance practice, as well as newer techniques and ideas, to make an amazing moment with every piece.

With archives, it seems much the same. We take our knowledge of archival theory and practice, our experience with research and patrons, and filter in new ideas as they come into play, and create access to collections in the most logical and constructive way we can. The dynamics of this project are especially beneficial to the collaborative practice. Students and recent graduates are processing under the direction of more experienced archivists in an environment that encourages us to speak out and exchange ideas, both with our peers and our mentors. So, though playing cello is no longer the central focus of my daily life, I’m very excited to have returned to a profession that can offer that same sense of community, joy, and accomplishment.

Minimal Deaccessioning

This post originally appeared on the PACSCL project blog.

The parameters of our Hidden Collections project generally preclude any deaccessioning efforts from being part of the process. We’re tasked with moving at a relatively swift pace – roughly twice the speed of “traditional” archival processing – and this doesn’t leave a lot of time to go through and check to see if some items could or should be removed from the collections. Additionally, being archival interlopers, fairly unfamiliar with the collections and procedures of our temporary homes, leads us to err on the side of caution and leave the task of deaccessioning for another time and, usually, another archivist. However, I’ve found that from time to time, some deaccessioning can take place with relatively no additional time taken for the process.

Obvious duplicates
Obvious duplicates

A prime example of this came in the past couple of weeks with our collection at the Drexel University College of Medicine (DUCOM) Legacy Center Archives and Special Collections. At DUCOM, we are processing about 250 feet of materials in the Academic Affairs records group of Hahnemann University. This group is made up of many smaller collections of papers from administrators and faculty, as well as broader collections from academic units, assorted publications, and more. While processing each of these collections, we often noted files that we knew we had seen before and were obviously duplications, but due to time constraints and issues of provenance, we let this fact bother us momentarily and then moved on. But when it came to the series of publications, the rules changed a bit.

Deaccessioned publications
Deaccessioned publications

As the materials in the series came from a variety of smaller collections of publications, the aim was to file them all together, leaving issues of provenance out of the picture. And, as we decided to file them chronologically within four subseries, picking out the duplicates became quite simple during the final process of arranging and boxing. As can be seen in the accompanying pictures, duplicated publications were blatantly obvious. After a quick glance through each set of duplicates, three copies of each were retained, consisting of the versions in the best condition or any annotated copies. The excess duplicates were removed from the collection and given to the main archivists who will decide upon their ultimate fate. Though it may not seem like much in a collection of roughly 250 feet, we were able to remove over a foot of redundant material in this manner without slowing down our process. We consider this a win-win situation and recommend using this idea of minimal deaccessioning when possible with future collections.

Saint Peter Claver Roman Catholic Church records

This post originally appeared on the PACSCL project blog.

The records of St. Peter Claver Roman Catholic Church of Philadelphia, one of the collections held at Temple University’s Special Collections Research Center, sheds light on a unique aspect of Philadelphia history. The church was started in 1886 when African American Catholics in the region grew tired of the discrimination they faced at Catholic Churches of the day (if they were allowed in at all). Members of three parishes united together to form the Peter Claver Union with the goal of creating a “Church for Colored Catholics” in Philadelphia.

In 1889, they were officially recognized by the Archdiocese of Philadelphia, and in 1892, they moved into their new home at 12th and Lombard Streets (a former Presbyterian church). The church continued to function for almost a century until the Archdiocese suppressed the church in 1985, stating that due to the changing racial climate, a dedicated church for African Americans was no longer needed, thus removing their parish status, as well as all of their records. At this point, the church continued to function as a community, but could not offer most religious sacraments and services.

In processing the records of this collection, one obvious drawback is the lack of most records from before 1985 (outside of the school records). Rather than finding records focused mainly on the administration and rituals of a church, this collection’s focus is found in the community outcry over the suppression of the parish, clippings and other subject files covering the African American community at the time, the church community’s struggle to remain vibrant in a neighborhood that had lost its African American majority, and many issues of racism (real or perceived) within the Catholic Church as a whole.

From a processing perspective, this was my favorite collection from our time at Temple and that comes from it not having been previously processed. It was quite rewarding to take a box full of papers and create a logical order to the contents, rather than just relabeling folders or trying to figure out why someone had deemed certain records appropriate to folder together.  This collection, though smaller than our previous ones, offered a chance to do some actual MPLP processing (a goal of this project), as well as learn more about Philadelphia history. And while I’ll not comment on my personal views of the acts of the Catholic Church regarding St. Peter Claver’s, it is quite eye opening to read about this time in Catholic history.

Processing Up

This post originally appeared on the PACSCL project blog.  

The Hebrew Sunday School Society (HSSS) collection at Temple University’s Special Collections Research Center contains roughly 35 linear feet of records that span two centuries (1802 to 2002) and document the history of the Society. HSSS was founded in 1838 by Rebecca Gratz (a Jewish philanthropist in Philadelphia and the basis for the character of Rebecca in Sir Walter Scott’s Ivanhoe) with the intention that all Jewish children could attend classes regardless of financial standing or synagogue affiliation. The collection consists of administrative records, papers and programs from school teachings and functions, some very cool artifacts (e.g., lantern slides, a large hand bell used for fire drills, books and other items originally belonging to Rebecca Gratz), and many photographs.

In working with the collection, my processing partner (Annalise Berdini) and I came across a somewhat frustrating issue – that of attempting to minimally process a collection that had been previously processed to a much more detailed level. This collection, which consists of no less than 17 different accessions, had been processed by various people, and to varying levels. Additionally, a number of the more ‘eye-catching’ items had been used in an exhibit, so they had been somewhat separated from their contextual homes.

Hopping through the decades
Hopping through the decades

Many folders were found to contain just one document, or perhaps a few. Others had a slew of records stretching back many decades, but hopscotching through the years like a child at play. It’s not uncommon to find a date span such as “1877, 1882-1888, 1906, 1910-1913, 1930-1959, 1965-1985.”

Other folders seemed to be making a summary of the entire collection, with one or two examples of each type of document from each series we’d constructed, leaving us frequently asking, “How do I label this and where does this go?” (Personally, I’m planning to petition for the word hodgepodge to be added as acceptable terminology since miscellaneous is out of the question.) And then there were the occasional appearances of spotty preservation work (though I can’t be sure when that occurred).

Spotty preservation practices
Spotty preservation practices

The folder titles were sometimes helpful, but with any number of people having created the folders over those many many accessions, they were inconsistent. Some had specific titles (some VERY specific); some were quite vague (my favorite from the collection being “Miscellaneous, etc.”). Some had dates (often inaccurate); most did not. This all boiled down to a lot of folders being refoldered; all of which needed to be inspected for more accurate information; and this all slowed down the process considerably. One day, I spent close to five hours making my way through just one linear foot of folders.

The takeaway from the HSSS records is in highlighting the fact that MPLP (or maximal processing, really, which is closer to what we’re doing in this project) is not suited to every collection. This collection, though not done to our current standards, had been previously processed and some sort of inventory did exist. As such, it was most likely not the best choice for this processing project (though we all enjoyed the content of the collection quite a bit). If a collection has already gone past minimal processing, it’s rather difficult to back that process up.

It is done.

Two and a quarter years ago, I embarked on a new chapter. After taking a look at my life and myself and sorting out where I wanted this life to go, I started in on the MSLIS program at Drexel University. Forty-five credits, 21 reports, 10 research papers, 3 annotated bibliographies, 7 exams, 9 assorted projects, and what must be at least 450 discussion board posts later, I have finished everything required and requested of me. And while there have been many times through the course of the program where I couldn’t help but ask myself, “WHY AM I LEARNING THIS?,” I am quite happy with my choice and the direction things are going now.

When I was thinking about what to do with my life after struggling for years to make it in music and finding myself stressed out and depressed and disliking the office assistant work I had fallen into, I thought about what sorts of things I like to do, outside of playing the cello. As simple as it sounds, I came to library science out of a desire to create order and efficiency. I like systems and I like things to be logical. And to me, that seemed like a good place to start. And then I chose to concentrate in archival studies because I like old things. It makes me laugh, now that I actually know about archival studies, but that was my thinking at the time.

People complain a lot about library school, and I can totally understand why, but it really can be what you make of it. My papers and projects generally focused on topics I took an interest in and wanted to explore more – music librarianship, copyright issues, LGBT history and activism, among others – and that kept potentially-tedious projects from becoming a chore. And now it’s over and I’m quite happy with the work I’m doing on the PACSCL/CLIR project. I think I may have actually found a good niche for myself in archives work. With any luck, a permanent position in the field will be found in the not-too-distant future. For now, I will just sit still and try not to start another project just yet.