Elizabeth’s last “Behind the Scenes” post on our file nomenclature has a flip side: “Once you name them, how do you organize those scans on the server side?”
Two weeks ago I received what I’ve kiddingly called a “cease and desist” order from our IT department. At seven minutes before closing, I learned that our “projects” drive—where we’d been storing our scans from the Morton collection (among other collections)—was full. I knew this and had already started working on the problem earlier in the day. The friendly caller from the Library IT department was nicely adamant, however, in his insistence that before I left the office that evening I had to remove at least 10 gigabytes of data, and continue the process the next day. The Morton scans had already amounted to 140 gigabytes; respectable, but not immense . . . unless you do not have the room!
The Library’s long-term storage configuration, the Digital Archive, has lots of disk space, but that server is designed for little to no alteration of files, file names, or directory structures once files get deposited there. That’s with good reason: you don’t want to be accidentally renaming, deleting, or copying over files intended for long-term storage. Our short-term storage is the projects drive, a server meant to be used as an interim holding tank for scanning projects before transferring files to the Digital Archive.
Scanning as a processing tool represents the antithesis of the long-term storage model; “in process” scans are not necessarily ready for prime time because we don’t know what many of the images depict. By scanning negatives, for example, we get to see the images to aid in identification. We also have the possibility of matching up related negatives or slides that have been separated as their relationships are rediscovered. The resultant scans, however, can set on the server for a long time before any file name or directory manipulation might take place, and that runs contrary to the purpose of the project drive.
So scanning during processing fits neither storage model. Well, where (and how) are we going to put 140 GB of data? That’s not a huge amount on the face of it, but it is only going to keep growing, so we needed a file structure that would handle that growth, and one that the IT department could accommodate on the systems side, too.
Here’s what emerged:
The diagram above shows the directory structure for the Morton scans. The top-level directory is P081_Morton, with five (for now) subdirectories: NTB (black-and-white negatives), NTC (color negatives), PRB (black-and-white photographs) , PRC (color photographs), and PTC (color transparencies, which are positives). Each subdirectory has a number of subdirectories that further refine the category, as do those subdirectories have additional subdirectories. The following diagram illustrates the file structure for color transparencies, showing the mounted slides at the base level:
An illustration showing individual scans for 35mm slides, then, looks like this:
On the server side, we have re-conceptualized the notion of the Digital Archive. In the long run, we may end up with a third type of storage for longer term projects with sizable and flexible storage requirements that allow us to rename files, move them around into different directories, et cetera, and then formalize them before they are moved into more permanent storage. Until then, IT is unlocking the restrictions placed on the Digital Archive to accommodate our immediate needs. (And we are not the only ones scanning lots of material!) In the final scenario, the university’s Institutional Repository will be the ultimate long-term preservation storage solution, but that is a few years in the offing. For now, we’ll be using the approach above . . . and keep on scanning!