The 2014 parliamentary elections in India witnessed two trends this year: a huge increase in new voters and massive advances in technology. In a recent news report, we uncovered the way in which analysts were using data from a range of sources – Facebook likes, tweets, calling behaviour, SMS’s – to help politicians better understand the electorate and where they should target their efforts.

In line with this, a data analytics startup called Modak Analytics recently announced that it has built India’s first Electoral Data Repository. The project involved different data from 814 million voters, proving to be the largest of its kind on the planet. In comparison, the USA has 193.6 million voters, Indonesia 171 million, and the UK 45.5 million.

The complexities included:

– 543 Parliamentary and 4120 assembly constituencies

– 930,000 polling booths

– Voter Rolls in PDF in 12 languages

– 900,000 PDFs, amounting to 25 million pages to be deciphered

– Diverse range of Voter Names and Information

To deal with the complexity, the infrastructure used for the project included a 64 node Hadoop, PostgreSQL, and “servers that process a master file containing over 8 Terabytes of data.” Machine learning algorithms were also developed to help categorize people based on name, geography, religion, caste and ethnicity.

“Data from multiple sources like Census, Economic and Social surveys were mapped to polling booths,” said Aarti Joshi, the co-founder and executive vice president of Modak Analytics. “Simultaneously, external and propriety data sources had to be fused with individual voters’ data. Because of this complex nature, no big IT company ever ventured into this.”

While President Obama’s 2008 and 2012 campaign was the first major instance of social media and big data’s use in politics, India’s 2014 elections has set a new precedence. The context for data collection in India is markedly difficult in comparison to the US; the country is heterogeneous and diverse, as well as vast and non-uniform. As such, the work conducted by Modak Analytics is not only ground-breaking, but also particularly relevant for businesses dealing with complex, unstructured data.

Read more here

(Image Credit: Yogesh Mhatre)

Previous post

Netflix Creating TV Shows with Big Data

Next post

SAS Expand Their Range of In-Memory Analytics for Hadoop