Digitizing the Archives

       The academic, with her round glasses, elbow patches, and knitted brows, has long been associated with the archive and its cascading shelves of rare books, ephemera, and microfiche. She sifts through old, leather bound diaries and pours over hand-written Congressional proceedings. Adjusting her glasses, she pauses to take notes with a small graphite pencil and yellow notepad, marveling at what she found under the dull glow of what seems like the building’s only working lamp. This is it, she thinks, my evidence!

       While the inquisitive historian sifting through paper mounds is more of a caricature than reality, many disciplines depend on archival research to some degree. Researchers, whether they are affiliated with a university or not, travel to archives to see, touch, and hear from the past and its protagonists. Most of the time, scholars have done their background research. Secondary sources have illuminated for them which libraries and historical societies they should visit, and, more specifically, which collections to look at. In the archives — which can range from the Library of Congress to the Morris County Public Library — researchers get a chance to look for themselves. Though they arrive in the reading room with their own experiences and perspectives, which of course shape the way they deal with and interpret sources, it is their unique interpretation, their fresh set of eyes, that furthers extant scholarship.

       Last summer, I did archival research for my senior honors thesis in history. I studied — or wanted to study, as my research question shifted dramatically over time — worker radicalism during the 1913 Paterson Silk Strike. To do so, I spent time at five diverse archives: the Walter P. Reuther archive at Wayne State University, New York Public Library, Tamiment Library and Robert F. Wagner Labor Archives at New York University, Passaic County Historical Society, and Paterson Public Library. Some of these places had reading rooms and clear cut procedures, and they were always full of researchers ready to do the same work I was about to do: look at old and important stuff. Other times, I conducted research at institutions that were community centers, public libraries, museums, or genealogical centers first. Many times, their collections were unsorted and untouched.

       Yet, going to any archive — be it NYU’s substantial labor history collection or the desk drawers of the American Labor Museum — is a tremendous privilege. It takes a lot of time, even with finding aids, to look through materials. Between transportation and lodging, not to mention photocopying fees, it becomes a big economic, physical, and emotional cost to scan through just a few boxes of material.

       Ultimately, as important as archives are to a broad swath of disciplines and professions, access to them is limited to certain populations, particularly well-funded researchers with time, graduate student employees, and people with some kind of institutional affiliation. This is not something we should be comfortable with, considering scholarship and the marketplace of ideas work best when we allow and encourage folks from different, underrepresented backgrounds to participate. This is an epistemological problem as much as it is an ethical one; we miss out on good knowledge when we exclude in this way.

       Enter digitization. With advances in photography, scanning, and online storage capacities in the 21st century, turning physical archival materials into accessible, virtual bits is not only possible — it is quickly becoming the preferred method of storing, processing, and analyzing primary sources. The process of digitization expands access to archival material across the board—to folks who cannot leave their homes, to folks who cannot afford to get to the archive, to folks who cannot find the time to travel. Anyone with minimal, if any, funding can now venture outside of his or her locale to write on a topic that genuinely interests them (provided, of course, those sources have been made available online). Digitization equalizes the archive. It invites certain populations to grapple with the historical record on their own terms and with their own unique set of eyes.

       Even experienced researchers who can make the trip to the Boston City Archives in Roxbury, or the San Francisco Court Historical Society, or Buckingham Palace are thankful for the move to digital archives — particularly the searchability and date-restrict functions — that make locating sources a quicker and more efficient process.[1]

       Moreover, digital archives can work nicely with citation managers, a type of tech humanities and social science scholars are enthusiastically embracing. Cutting down on time spent taking down and inputting metadata for citation means more time for the work machines can’t do: critical textual analysis. Researchers want to devote most of their time to extrapolating how Document X might fold into their argument, to what Document Y actually means in context.

       Another benefit of digitizing archives is the most obvious one: preservation. Most archival material is old, some of it very, very old (one surviving Jewish prayer book, the Siddur, dates back 1,173 years). Of course, certain material is durable, such as 18th century rag pulp paper, and holds up well in the archive (remarkably, wood or straw pulp paper, far more brittle, replace rag pulp in the 19th century).[2] Make-up aside, continuous handling has the potential to ruin any important source, rag or straw, which is why archivists rushed to load especially vulnerable material — newspapers, most commonly — onto microfilm and microfiche in the 1960s and 70s.[3] There is a certain immortal quality to digital data, though we know that data also takes up physical space and can be corrupted, even lost entirely, that seems promising.[4] Wildfires, storms, floods, and rough handling must work a lot harder to destroy digital archives.

       Digitization can even enhance documents that have depreciated over time, restoring what many believed was permanently lost. Photos and film have benefited tremendously from this process, as have the people who study those materials. Online databases sometimes even allow researchers to zoom in and out of, brighten, and sharpen documents on their own depending on site capacites. Archival material thus becomes clearer and more legible, and research becomes faster and more efficient.

       The case for digitization is a strong one. The ends — greater access to and stronger preservation of the archive — are noble, after all, so we should ride rather than resist technological advances even as we study the past. But archival digitization is an extremely costly process. It is not as simple as scanning a letter with a Canon-all-in-one (as good as that Canon might be) and it takes time, manpower, and money.

       In the “imaging” stage of the process, archivists need to make decisions about how best to digitize a particular piece of the past: It might very well be the all-in-one or, if we are talking about a map, it might be a large-format scanner, a camera, or some other kind of equipment. They certainly need to have access to particular tech, and they need to be familiar with that equipment so they can do a document justice in its digital form. To attain this kind of expertise, archivists have to have undergone a course or program in preservation or museum studies. But not every archive can afford to hire trained, full-time staff. Smaller libraries employ part-time workers and volunteers. Historical societies have interns who take care of their collections (I was one such intern at the Morris County Historical Society and the American Labor Museum in the summer of 2017). When digitization does happen in those archives, it’s usually a very slow process — done between projects or exhibits — with little oversight and quality-control, and it's typically managed by one person with little to no archival experience.

       Secondly, the tech to digitize is remarkably expensive. One large-format scanner, the Kirtas Skyview 3525, goes for $68,000, and book scanners typically run between $8,000 and $38,000.[5] For smaller archives, these are huge price tags to pay for high-quality digital images. In fact, even larger archives don’t undertake digitization projects unless they have received a grant to do so. Also take into account the expenses tied to creating and maintaining a website to store the digital archives and runs the functions I mentioned earlier (zooming in, searching by date, sharpening the image, etcetera). Coders — good ones — have to build that sort of thing, and they certainly don’t come cheap. Whereas physical archival research places the cost burden on researchers themselves, digitization places the cost burden on the archive, and while some can and do pay the price, smaller institutions aren’t as lucky.

       More so, when we make the move to digital, we do sacrifice something significant: That connectivity to the past that we get when we come into physical contact with a real piece of history. And it’s not just about the aesthetic value of a penned letter, or a drawing, or a pamphlet from 1903. Real knowledge gets lost in translation. That is, as sharp and clear as the data might be, we do miss, obscure, and erase information when we digitize our archives. Archivist spend time organizing and sorting collections so that documents make sense in context, and a lot of context depends on the physical characteristics of the collection: what is in what folder, what is in what box, what is stuck or stapled or pinned to what. Often, these menial things — the smell of a document, its tears, its place in a given collection — don’t translate online.

       Moreover, archivists exercise enormous power over historical memory when they sort X document into Y collection, when they decide that something is worth preserving at all. Because of these highly regimented systems, researchers interested in a particular time period or theme are exposed to a very specific set of documents, some related to their research question and some (or more accurately most) not. The documents they look at it, read, and analyze shape not only their argument, but the research question itself. Organization, what is grouped with what, matters.

       Computer algorithms make sorting decisions based on featured items, terms, and themes. Documents begin to relate in a different way; they are no longer confined to one folder or box. Researchers are thus exposed to a different set of items than they would be at the physical archive. This can be a bad thing, as Professor of Opera and Musical Theatre at the University of Sussex, Nicholas Till, notes in his piece of archival research in Italy in the Times Higher Education magazine:

       Turning the page right-side up, it was evident that the scribbles were the result of Cicognini testing a new pen. But there were also some more carefully written words, in the fancy humanist script used by Cicognini for headings... And then a name jumped out at me: “Galileo.” Although a number of people have looked at this manuscript, no one has commented upon (or perhaps even noticed) this startling reference…[t]he discovery of this reference to Galileo in the manuscript for Cicognini’s music drama is for me further testament to the intertwining of opera and science at this time. Would someone making a digital scan of a document like this notice some upside-down jottings at the back of the manuscript, or think to copy them?[6]

       Yet, to quote another historian, Michelle Moravec, on digital archives: “the question is not should historians use digitized archival objects. They will.” The real question is how “historians [will] grapple with the implications of working in digital archival environments” because we cannot simply “treat them as virtual equivalents to physical archives.”[7] This is a new animal, and while we run the risk of losing groundbreaking “upside-down jottings,” these new arrangements and categories might also inspire new ways to think about primary sources and, by extension, the arguments and questions they give rise to.

       In that case, digitization need not replace the physical archive. Instead, it can supplement it as an additional method of acquiring knowledge about a person, place, proceeding, or project in its own words. In order for the physical and digital archive to exist side-by-side, of course, we need to focus our efforts on funding: first for researchers, particularly those who cannot afford to go to the archive, and second for digitization projects at smaller archival institutions. This includes lobbying the federal government, nonprofits, and universities to focus more of their financial resources on important humanities research, a tall order certainly, but not impossible.

       Two of my thesis’ most important sources — a local newspaper that covered the strike every day and a 700-page Committee on Industrial Relations report — came from a small local library archive and the University of Michigan library website, respectively. Without this mix of print and digital, physical and not, I would not have come to the same conclusions. I would not have written the same paper. I probably would not even have asked the same question. Indeed, as researchers continue to look through the archive and search documents online, we will get better and more diverse scholarship, more questions and certainly more answers, that embraces, but does not bow down to, the digital age.


[1] The National Archives: Why Digitise; University of Miami Digital Collections allows users to search by date.

[2] Changes in Print Paper During the 19th Century

[3] Microfilm: A Brief History

[4] Why Don't Archivists Digitize Everything?

[5] What We Use to Digitize Materials.

[6] Out of office: on research leave in Florence

[7] How digitized changed historical research