Data Mining As Complement to “Great Man” History

The National Endowment for the Humanities, in partnership with the National Science Foundation, the Social Sciences and Humanities Research Council of Canada, and the Joint Information Systems Committee of the United Kingdom, have just awarded their first grants in the Digging into Data Challenge (H/T Wired Campus).  The collaborative projects supported by these grants will dig into large data sets in search of new kinds of insights into humanities and social science research.  As the organizers explain:

“With books, newspapers, journals, films, artworks, and sound recordings being digitized on a massive scale, it is possible to apply data analysis techniques to large collections of diverse cultural heritage resources as well as scientific data.  How might these techniques help scholars use these materials to ask new questions about and gain new insights into our world?”

As my title suggests, accessing and analyzing large document collections can help provide an additional perspective to historical narratives.  Consider letters home from soldiers, used to great effect in Ken Burns’ The Civil War.  By analyzing their content, scholars can provide narratives based on the experiences of so-called ‘regular folks’ to put up against the correspondence of politicians and generals that often dominate war accounts, if for no better reason than it was easier to read an analyze the correspondence of a few compared to the correspondence of many.  This isn’t the case any more, or won’t be soon.


3 thoughts on “Data Mining As Complement to “Great Man” History

  1. You’ll enjoy James M. McPherson, For Cause & Comrades: Why Men Fought in the Civil War (Oxford U Pr, 1997), which does pretty much what you propose.

    Any kind of new funding for the humanities is great, and I’m sure there are a lot of fundable projects out there. On the other hand, I’m skeptical about whether “data mining” can (as of now) have a big impact. Humanists in general try to reconstruct meaning, and meaning is not (yet) a computer-detectable feature of texts or other cultural artifacts.

    For example, I’d be interested in collecting a corpus of “climategate” commentaries, and searching to see who is using the word “hacked” v. “stolen” v. “leaked” to describe how the emails got public. But I’d still have to read the texts to identify the arguments being used–even my most sophisticated AI friends haven’t figured out a way to detect arguments that doesn’t involve a human brain.

    On the third hand, natural language processing is improving rapidly, so we may get there soon!

  2. Pingback: High Performance Computing and the Humanities « Pasco Phronesis

  3. Pingback: Digging Into Data Is Back For More « Pasco Phronesis

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.