Researchers Tap Google Books to Create the Word Cloud for Human History
Humanity’s legacy of millions upon millions of books represents an unparalleled reservoir of data, precisely detailing the changes in language and culture over the centuries. Now, if only a search engine giant were digitizing that history…
Oh, right. Google has been doing just that, and now scientists are beginning to tap that treasure trove of data.
Together with over 40 university libraries, the internet titan has thus far scanned over 15 million books, creating a massive electronic library that represents 12% of all the books ever published. All the while, a team from Harvard University, led by Jean-Baptiste Michel and Erez Lieberman Aiden have been analysing the flood of data.
Their first report is available today. Although it barely scratches the surface, itâ€™s already a tantalising glimpse into the power of the Google Books corpus. Itâ€™s a record of human culture, spanning six centuries and seven languages. It shows vocabularies expanding and grammar evolving. It contains stories about our adoption of technology, our quest for fame, and our battle for equality. And it hides the traces of tragedy, including traces of political suppression, records of past plagues, and a fading connection with our own history.
Do yourself a favor and check out the rest of Ed’s extensive postâ€”including fascinating examples like the “half-life” of any given year being mentioned in literatureâ€”over at Not Exactly Rocket Science. And try out Google’s search to see the prevalence of any phrases or phrases over the years.
Image: Wikimedia Commons (New York Public Library)