All posts by abigailnorris

Disk ≠ Image: Creative Solutions to Issues in Born-Digital Processing

Hello. It’s Abbie again, back with another installment on processing and publishing the Volz & Associates, Inc. collection. I recently had the opportunity to present on this collection at The University of Texas’ 2019 Digital Preservation Symposium, where I discussed some of my solutions to born-digital access and preservation issues.

In the course of processing this collection, some of the most significant roadblocks were related to problematic disk images and metadata generation. These revolved around two distinct media types: zip disks and CDs. The collection contains 85 inventoried zip disks. When these were imaged and analyzed in BitCurator, the report came back riddled with error messages. Even though the disk images had been generated successfully, something was keeping fiwalk from generating the correct metadata. This made it impossible to determine necessary information like disk extent, content, and the presence of possible PII or malware.

Blog1
Results for the zip disk analysis. Not what you want to see!

When processing CDs, I noticed a group of about a dozen disks each had the exact same metadata. While some similarities are expected, I thought it was strange that that many disks from different projects could each have identical specs. I analyzed the disk images and physical disks and found that they were all CDs of photos processed by the same company. The original content of the disks had been overwritten by the company’s built-in structural and technical metadata, making the disk images virtually useless for research purposes. However, when these disks were mounted in FTK Imager, a write-blocked environment, it became clear that the original images were still extant – they just weren’t being picked up in the disk images.

Blog2
Optical disks with identical metadata.

In both of these cases, I knew that the original content could be reached but didn’t know how to access it while adhering to the best practices set up by both the digital archiving community and the resources at my disposal. The disks with these issues amounted to nearly 1/5 of the imaged collection, and many hours of research and testing were put into how I could reprocess this information. Ultimately, I found myself faced with a choice – I could either preserve inaccurate representations of the disk by retaining the existing disk images or preserve inaccurate representations of the metadata by extracting extant files instead of preserving a disk image. The issue with preserving the metadata as it stood was that it was wrong to begin with, and the resources at hand didn’t provide a solution.

When thinking about how best to reprocess this information, I was drawn to this quote from a study undertaken by Julia Kim, the Digital Assets Manager at the Library of Congress:

“Most of the researchers emphasized that if it came to partially processed files or emulations and a significant time delay in processing, they would take unprocessed and relatively inauthentic files. Access by any means, and ease of access were stressed by the majority.”

“Researcher Interactions with Born-Digital: Out of the Frying Pan and into the Reading Room,” saaers.wordpress.com

While we certainly don’t want to prioritize patron opinion over archival standards, this quote made me question whether sacrificing access and preservation to retain imperfect metadata could truly be considered “best practice.”

Working closely with the UT Libraries Digital Stewardship department, I created a workflow whereby I extracted files using FTK Imager and generated SIPS and a bulk_extractor report using the Canadian Center for Architecture’s Folder Processing Tool in the BitCurator environment. The only impact on the metadata was a change in the “Date Modified” field, which now showed the date the files were extracted and could be amended to reflect the most recent date in the file tree. While this is an unconventional approach to digital preservation, the resulting AIPs and DIPs are better representations of the original disks and will allow us to provide more comprehensive data for future researchers.

Blog3
The amended workflow.
Blog4
Our new results.

After my presentation, I had the opportunity to talk to a digital archivist about this workflow. We discussed how best practice is ultimately not about perfection, but about preserving and providing access to our materials and documenting the process. While somewhat contrasted from the theoretical approaches I’d both learned in class and adapted from online resources, this approach seemed more natural to me. It affirmed both my processing decisions and the opinions I’ve developed about what best practice in the archival community is and should be. This process has shown me how vital the user is to the archive, especially when developing workflows for digital materials.

As I mentioned in my previous post, I love having opportunities to exercise creativity and problem solving in this position. It’s even better when those opportunities lead to breakthroughs that help me grow professionally and enhance the data we provide patrons. I’m excited to see new developments in our born-digital workflow as we get closer to making this collection available to patrons. Check back soon for a (final?) update from me – I hope you’ve enjoyed learning more about the behind-the-scenes work of born-digital archiving.

Consider the Floppy: Exploring Access Issues with the Volz & Associates, Inc. Collection

A black 3.5 inch floppy disk with a label reading "RJA: Geomsn.zip"
A floppy disk from the Volz & Associates, Inc. collection.

Hello! This is Abbie Norris, back with an update on the born-digital Volz & Associates, Inc. collection. For those who haven’t read my previous blog post, I am the digital archives Graduate Research Assistant at the Alexander Architectural Archives. I’m currently working on the Volz & Associates, Inc. collection, which documents the work of a historic preservation firm based in Austin, Texas.

When I published my previous post, the collection was in the midst of being processed. I’m happy to report that processing for this collection is complete – all 813 floppy disks, CDs, zip disks, and flash drives of it. Processing is one of the first major stages of getting a collection from the donor to the public. It’s when the bulk of archival preservation happens. In this case, as in many born-digital collections, “processing” involved imaging (essentially, copying) the disks, capturing metadata like disk size and file types, and recording everything for documentation in the finding aid. We’re now able to determine the size of the collection, the types of files, and what we need to provide access to them.

One of the things I love about born-digital archiving is the problems that arise that require creative solutions. This is especially true for an archive’s pilot born-digital collection, as is the case for Volz. Items like CDs and floppy disks degrade at a faster rate than paper materials, meaning that sometimes you try to open a disk that physically appears fine, and it won’t show any of your files. One major question people have is, “If all of the information is stored on a CD, why do you have to copy the contents in a disk image? Why can’t you just continue opening the CD to access the contents?” Luckily, the answer is simple.

Imagine you have a 13th century codex and a 1990s floppy disk. Which one is easier for you to read? With the codex, all you have to do is open it. It will be fragile and you might not know the language in which it’s written, but much of the information held in the book will be visible to you. Now, consider the floppy disk. When was the last time you used one? Does your computer still have a floppy disk drive? More than likely, the answer is, “No.” Even if it does, think about the files on that disk. Can you still open a WordPerfect document from 1992?

Given that the Volz collection dates between 1980 and 2009, the types of files present on the disk vary widely. Some, like .txt and .tif files, are still widely accessible and are projected to remain active filetypes in the future. Others, like .jpeg, are still accessible but are not recommended for preservation because of their lower quality and the potential for their use to cease. Finally, there are the files that you try to open with modern software…and nothing works. These files can be either old versions of proprietary software and discontinued software.

This is where creative solutions come in. There are a variety of tools, many borrowed from the criminal forensics world, that allows us to look at files from twenty years ago. Because the digital archive field is still developing and many of the projects use open access tools, the software an archive uses to read and provide access to old files can resemble a patchwork quilt. Now that I know exactly what types of files are in the collection, I love exploring access solutions and finding answers to questions that have persisted since I began working here.

In many ways, finishing processing feels like finishing the first segment of a relay race. I feel accomplished for finishing a major task, but there is still a long way to go. Now that processing is complete, we have to finish writing the finding aid and establish methods for researchers to access the collection. It’s going to be an exciting few months, so check back here to learn about what providing access to a born-digital collection looks like at the Alexander Architectural Archives.

The Volz & Associates, Inc. Collection: Born-Digital Initiatives at the AAA

One of the many structures VOH Architects worked on: the Littlefield House on the University of Texas at Austin campus.
One of the many structures VOH Architects worked on: the Littlefield House on the University of Texas at Austin campus.

Hello! My name is Abbie Norris, and I am the current digital archives Graduate Research Assistant at the Alexander Architectural Archives. My primary job is processing the born-digital content received in the Volz & Associates, Inc. collection. This collection contains the records of the Volz & Associates, Inc. architecture firm, which is focused primarily on preserving and restoring historic buildings and interiors. The collection showcases notable buildings from Texas and United States history and is an excellent resource to discover how much is needed to keep historic buildings authentic and alive.

A gray CD reading "Images for Volz: Elisabet Ney Museum, April 2007"
A sample CD from the collection. Born-digital archiving requires preservation two ways: retention of the original media and capture of the data for long-term storage.

The Volz Collection is significant to the Alexander for several reasons, but most importantly, it is the archive’s first large-scale born-digital accession. In addition to analog records and building materials, the collection includes roughly 450 floppy disks, 250 CDs, 90 zip disks, and one lone flash drive. These materials document the life of the firm from the early 1980s to the mid 2010s. So far, we have imaged over 100 filetypes representing everything from office files to construction reports to historic photographs. It’s a diverse array, and as the project moves forward, we’re faced with many questions about how best to provide access to researchers.

As diverse as the filetypes are the kinds of buildings included in the collection – though many are tied by one important identity. Volz  worked on buildings of many functions, styles, and preservation needs. While these buildings span the United States, the majority of them are located in Texas. Included are the Governor’s Mansion, the Alamo, the Lyndon B. Johnson Ranch, and the Alexander’s own Battle Hall. I love working with this visual representation of Texas history. Whether it’s by noticing design similarities between county courthouses or the way historic landmarks are used and maintained, the collection is an in-depth look into how architecture shapes our state and its identity.

Scaffolding covers a green dome atop a white tower.
Restoration underway on the Colorado County Courthouse dome. Photo credit: Volz O’Connell Hutson Architects, (http://voharchitects.com/projects/colorado-county-courthouse/).

In my four months of working with this collection, I’ve learned an incredible amount about both the intricacies of born-digital archiving and the breadth of work architects do. Through the frustration of software bugs and the triumph of imaging previously unreadable disks, this is a fascinating collection that provides many learning opportunities.

The next steps of the project are to finalize the creation of a finding aid for these born-digital materials and to determine methods of access once the collection is published. Check back here soon for collection updates and an in-depth look at the world of born-digital archiving at the Alexander Architectural Archives!