Building a Declassification Engine

Over at Columbia University a few historians and computer scientists are working on applying computer power to the declassification of government documents.  They call their planned suite of applications the Declassification Engine.  While the specific applications have different purposes, the general idea is to use machine learning to sift through the growing amount of not-yet-unclassified documents held by various government archives.  While I wasn’t able to grasp exactly why this was happening, apparently a large number of documents are being destroyed by archivists prior to the public being able to access them.  The organizers of the Engine hope that their work can help minimize that possibility.

While the researchers expect to be able to predict certain things about classified documents based on this work, they do not have access to classified material (in other words, these folks are not gathering data in some of the ways that Wikileaks has.  The idea is that analysis of declassified documents can provide some sense of patterns and word choices that could be expected in similar, but still classified, materials.  As the project continues, scholars with their own archives of declassified material are welcome to submit copies for use by the Engine.

There will be a conference on the Engine held this Friday, May 10, at Columbia.  The agenda suggests the kinds of work going on in this kind of document mining.  Beyond learning more about what gets classified, the project could help us figure out the rules for classification, and – what I think could be much more interesting – how those rules have changed over time.