Wednesday, February 27, 2013

Dating Homer

From Geneticists Estimate Publication Date of  the "Iliad."
Of course, publication is not exactly the term one would use for an oral work, which, as the research shows seemed to have grown out of various other oral traditions that go back another 500 years or so before the "publication" date. Still, the language itself served as the bread crumbs that mark the trail of origins to when the compilation of stories known as the Iliad became set in the form that has been passed down to generations.
"Languages behave just extraordinarily like genes," Pagel said. "It is directly analogous. We tried to document the regularities in linguistic evolution and study Homer's vocabulary as a way of seeing if language evolves the way we think it does. If so, then we should be able to find a date for Homer."

The date they arrived at was 763 BCE, give or take 50 years.

The researchers employed a linguistic tool called the Swadesh word list, put together in the 1940s and 1950s by American linguist Morris Swadesh. The list contains approximately 200 concepts that have words apparently in every language and every culture, Pagel said. These are usually words for body parts, colors, necessary relationships like "father" and "mother."They looked for Swadesh words in the "Iliad" and found 173 of them. Then, they measured how they changed.
 They took the language of the Hittites, a people that existed during the time the war may have been fought, and modern Greek, and traced the changes in the words from Hittite to Homeric to modern. It is precisely how they measure the genetic history of humans, going back and seeing how and when genes alter over time.

Monday, February 18, 2013

Don't tell me what to like or re-post

I'll decide on my own what I like or wish to share.  I find any attempt to divide people into good and bad teams based on their choice to promote the post or not an insult to my intelligence.

When I see a post that includes the words "Like if you ..." or "Share if you ..." the last thing in the world I want to do is like or share. Not only do I not like the suggestion of chain letters inherent in such exhortations, but the posts themselves are often pointless.

For example, one of my Facebook connections put up the following picture post:

Really, this is beyond absurd. Why not then have "Like if you wish AIDs/stroke/dementia/asthma/diabetes/tuberculosis/malaria/
didn't exist." In fact, you can put in "Like if you wish flat tires didn't exist" or "Like if you wish blackouts (especially during Super Bowls) didn't exist."

Another Facebook connection put up the following, including the odd capitalization, shift from noun to adjective in "spousal" and use of "anytime" when "anything" was likely the word intended:

Abuse of anytime is Despicable - Animal , child or spousal

‎In other words, if I don't re-post that chain letter in a jpg, I prove I don't have a heart. Very intelligent way to promote your cause. And just how will  spreading this post help protect any child, spouse, or animal from abuse?

I see examples like these as social media at its worst in terms of equating a share with real care. People believe they are doing something for a worthy cause when, in fact, their actions do nothing to improve the situation. Liking and sharing does not contribute to safety, prevention, or research. It just allows people to show that they  consider themselves sensitive and caring individuals with nothing more than a click.

Wednesday, February 13, 2013

The Big Bow-Wow & a Bit of Ivory

[This blog originally appeared on Big Data Republic in 2013. Unfortunately, all the content has been taken offline] 

Sir Walter Scott contrasted his style of writing with that of Jane Austen: "The big Bow-Wow strain I can do myself like any now going; but the exquisite touch which renders ordinary commonplace things and characters interesting from the truth of the description and the sentiment is denied to me. "While he characterized his work as large, Jane Austen called her own small, a "little bit (two inches wide) of ivory on which I work with so fine a brush."

Seeing themselves as such strong contrasts to each other, they likely would have been very surprised to be coupled together as "the literary equivalent of Homo erectus, or, if you prefer, Adam and Eve. " Using computational power to analyze 3,592 works published between 1780 and 1900, he concluded that Walter Scott and Jane Austen were the two primary influencers of all novelists who came after them in terms of style and theme.  Those are the types of discoveries that Jockers expound upon in his newly published book, Macroanalysis: Digital Methods and Literary History.

Systematic textual analysis has a history that goes much further back than computers. The first concordance, according to The Word Crunchers dates back about 800 years. It was a most labor-intensive project, taking up the work of 500 friars. A Chaucer concordance took 50 years until it was read for publication in 1927. Computers entered the picture as early as 1951 when "I.B.M. helped create an automated concordance."  Those were the days of punch card programming, so “indexing all of Aquinas took a million man-hours.” It was only complete in 1974.   Ten years later, though, computers could analyze texts effortlessly, as depicted in the reports of a novelist’s favorite word in David Lodge’s novel Small World.    

The proliferation of digitalized books, courtesy of Google books is what makes it possible for computers to now process huge volumes of text from thousands of works.  Matthew Jockers, along with Franco Moretti, founded the Stanford Literary Lab in 2010. The research is done in groups along the lines of scientific investigations with the help of computer.

The approach is critiqued by a Chronicle of Higher Education article as The Humanities Go Google:

Data-diggers are gunning to debunk old claims based on "anecdotal" evidence and answer once-impossible questions about the evolution of ideas, language, and culture. Critics, meanwhile, worry that these stat-happy quants take the human out of the humanities. Novels aren't commodities like bags of flour, they warn. Cranking words from deeply specific texts like grist through a mill is a recipe for lousy research, they say—and a potential disaster for the profession.

It’s not just a matter of traditionalists feeling threatened by computer power. Algorithms that depend on Google books for meta-data tags may reach wrong conclusions.  Geoffrey Nunberg, a linguist, is quoted as declaring Google’s tags "a mess," not to be relied on.  Aside from questions of accuracy, there is that of relevance. Researcher have to ask themselves: "What does this tell me that what we can't already do?"

I had the same question when I read the  article on Jockers. Aside from identifying the novel’s trail set by Austen, it points out the supposed revelation that the novels of George Eliot "more closely resemble the patterns of male writers."  Is it altogether surprising that the author of Silly Novels by Lady Novelists who deliberately adopted a masculine pseudonym broke the mold conceived for female writers?  That’s something that any student of Victorian literature should already know.

What this form of research could do that traditional studies do not is unearth the roads not taken by the literary canon. In a New Scientist article on Jockers’ work,  Nicholas Dames, chair of the department of English and comparative literature at Columbia University as seeing the value of this type of research to bring to light the full body of fiction "rather than the small percentage of canonical texts that are usually taken as exemplary." That opens up the consideration of the canon in a larger context, which can lead to questioning the marked trail of influence.   But that will only work if the Google Books data proves comprehensive and reliable enough to accurately represent the literature of the time.