As a staff member at a grant-making agency, I often review plans for or hear excited conversations about projects with extraordinary potential (best realized with the influx of NEH funds, of course!). Less frequent are assessments of completed digital humanities projects. On Wednesday, April 23, NEH had the pleasure of welcoming Mike Toth and Doug Emery from the Archimedes Palimpsest project, who shared their gained wisdom after years of overseeing the conservation, preservation, digitization, and transcription of a fragile and very valuable manuscript. Aside from stating that this project is really, really, reallycool, it was fascinating to hear how program manager Mike Toth organizes the work of a diverse and geographically disparate team of archivists, scholars, and scientists. With funding provided by the owner of the manuscript, the project team has accomplished much in its ten year history.
I was most struck by the challenge of accommodating and leveraging the changes in technology that have occurred since the project's beginning in 1998. For example, early images of the manuscript were taken with a 6.1 megapixel digital camera; in 2007, the imaging was redone with a 256 megapixel camera. The project has proceeded as changes have occurred not just in available software and hardware, but also in standards for the metadata that is central to organizing the vast dataset, which at this point exceeds 2 terabytes of data.
Mike and Doug stressed that a key concern for the project is the long term viability of the dataset beyond the lifetime of current technologies. Adherence to broadly accepted standards, use of non-proprietary software, and use of simple and flat metadata records are all ways to achieve the goal of long term viability. With an eye to future computational resources, they also noted that though working with 2 terabytes of data makes sharing quite difficult now, they anticipate a future in which the size of the dataset will be more manageable. Intrigued? An alpha release of the Archimedes Palimpsest bulk data is now available, with final release scheduled for October 2009.