The National Endowment for the Humanities, in partnership with the National Science Foundation, the Social Sciences and Humanities Research Council of Canada, and the Joint Information Systems Committee of the United Kingdom, have just awarded their first grants in the Digging into Data Challenge (H/T Wired Campus). The collaborative projects supported by these grants will dig into large data sets in search of new kinds of insights into humanities and social science research. As the organizers explain:
“With books, newspapers, journals, films, artworks, and sound recordings being digitized on a massive scale, it is possible to apply data analysis techniques to large collections of diverse cultural heritage resources as well as scientific data. How might these techniques help scholars use these materials to ask new questions about and gain new insights into our world?”
As my title suggests, accessing and analyzing large document collections can help provide an additional perspective to historical narratives. Consider letters home from soldiers, used to great effect in Ken Burns’ The Civil War. By analyzing their content, scholars can provide narratives based on the experiences of so-called ‘regular folks’ to put up against the correspondence of politicians and generals that often dominate war accounts, if for no better reason than it was easier to read an analyze the correspondence of a few compared to the correspondence of many. This isn’t the case any more, or won’t be soon.
You’ll enjoy James M. McPherson, For Cause & Comrades: Why Men Fought in the Civil War (Oxford U Pr, 1997), which does pretty much what you propose.
Any kind of new funding for the humanities is great, and I’m sure there are a lot of fundable projects out there. On the other hand, I’m skeptical about whether “data mining” can (as of now) have a big impact. Humanists in general try to reconstruct meaning, and meaning is not (yet) a computer-detectable feature of texts or other cultural artifacts.
For example, I’d be interested in collecting a corpus of “climategate” commentaries, and searching to see who is using the word “hacked” v. “stolen” v. “leaked” to describe how the emails got public. But I’d still have to read the texts to identify the arguments being used–even my most sophisticated AI friends haven’t figured out a way to detect arguments that doesn’t involve a human brain.
On the third hand, natural language processing is improving rapidly, so we may get there soon!
Pingback: High Performance Computing and the Humanities « Pasco Phronesis
Pingback: Digging Into Data Is Back For More « Pasco Phronesis