Wikipedia, many of us agree, is a marvelous resource. And no less than the prestigious journal Nature did a study, and found that the Wiki is about as accurate as the Encyclopedia Brittanica. A visit to the Wikipedia has almost become a daily affair for me. But that’s not the main point of this post.
I think the Wiki also provides a very good and real estimate of a number of socio-economic indicators (in this case for India). There’s been too much media talk and hype about IT revolutions in India, and greater nation wide penetration of the internet. So much so that some of us even have started believing the hype. The Wiki turns out to be a very good indicator of how true that is.
The Wiki after all is the ultimate internet tool. It provides information, and is created and maintained by all users. So, the more aware a society is of the internet and it’s power, the more it’s likely to use the Wiki. So, I just decided to take a look at the language wise breakdown in the Wiki. The order of representation of Indian languages (when compared to each other) hardly surprised me, but something else did.
The most represented Indian languages were Telugu, Tamil and Kannada (see the screen capture). This is not that surprising, since they are the most “IT enabled” states in the country. But it’s the sheer number (or lack of) entries that caught my attention. Only Telugu, a language spoken by around 70 million people, had more than 3000 entries. It was ranked 64 in terms of number of entries, with a similar number of entries as Irish-Gaelic, Kurdish, and Latvian. Tamil has around 2500 entries and Kannada has about 1500. Hindi, India’s most widely spoken language (with an estimated 400 million native speakers, and perhaps a couple of hundred more non-native speakers) has only around 1200 entries in total. Other major Indian languages, (Bengali, Punjabi, Marathi, Gujerati, Kashmiri) have only a few hundred entries at most. In contrast, Russian and Chinese have over 50000 entries, and Indonesian or Malay have well over 10000 entries each. So, the general situation in India becomes clearer (and I think it’s very reflective) without having to go in to expensive or inaccurate surveys.
To me, given that the majority of the population speaks one or the other Indian language, this says a few things:
1) Even if internet penetration has increased (with reports of over 10% of the population at least having access to the internet through net cafes etc), understanding of the power of the internet remains minimal.
2) The concept of the internet as an “enabling tool” is yet to catch on.
3) The internet in India not widely used as an information gathering, sharing and educational resource.
4) The internet is either not reaching the masses (who are most fluent in their native language), and/or all internet/computer education in India is only in English.
Which means that (a) there is a large untapped market and fantastic economic opportunity for someone to go in there, and create IT enabled learning in vernacular languages. It is not as if all Indian language speakers are poor. In fact, a majority of the population (even in cities) is most comfortable in Indian languages, and Indian language newspapers outsell the nearest English rival by a few fold. (b) If tapped, there’s a great deal of creative energy here that’s waiting to be released. The extended question would be, if one were to start exploring educating people to use computers and the internet in Indian languages, where would one start? Would it be in an area with some infrastructure and awareness (eg. Andhra, Tamil Nadu, Karnataka) or in a completely untapped and unexplored region (eg. most of North India).