Monday, July 20, 2015

Americans know nothing about "big data"

In Silicon Valley, Boston, Los Angeles, and elsewhere, you can hear a lot of us Americans prattling on about "big data," and about our expertise in big data.

But if you really want to understand big data, you have to go to a place where the total population isn't measured in the mere hundreds of millions.

China's not technologically there yet, but India certainly is.

Dataquest recently wrote about an Indian company that performed data analytics on 81.4 crore voters. And if you don't know about the crore unit of measurement, you'd better learn or be left behind (not in the dispensationalist sense).

The large numbers of voters weren't the only issue that Modak Analytics faced:

Some additional challenges were peculiar to India — voter rolls were in PDF format in 12 languages. Modak Analytics had to analyze over 9 lakh PDFs amounting to over 2.5 crore pages to be deciphered for any analysis. This data was mapped to 9.3 lakh polling booths across 543 parliamentary and 4,120 assembly constituencies.

“Every state had the data in their own vernacular language. For example, Tamil Nadu had the data in Tamil, Maharashtra in Marathi and Karnataka had this data in Kannada. To do any kind of data analytics, it was important to convert the data into a single language."

To overcome these and other challenges, Modak Analytics had to perform a lot of automation in its analysis.

Read the details here.
blog comments powered by Disqus