Social Informatics

Friday, April 12, 2013

TDIL (Technology Development for Indian Languages)

The Department of Information Technology initiated the TDIL (Technology Development
for Indian Languages) with the objective of developing Information Processing Tools and
Techniques to facilitate human-machine interaction without language barrier; creating and
accessing multilingual knowledge resources; and integrating them to develop innovative user
products and services. There are many issues that have to be tackled because of the diverse and
complex language system in India.
Key issues include keyboard layout, IT localization, Vedic code set, Unicode for Indian scripts…
Indian Language Keyboard is categorized into three parts namely, Inscript, Phonetic and
Typewriter keyboards. The Indian language alphabet table is divided into Vowels (Swar) and
Consonants (Vyanjan). The INSCRIPT (Indian Script) keyboard was standardized by Department
of Information Technology (DIT) and was declared as National Standard by Bureau of Indian
Standard (BIS). Very recently a single solution was found out for the incorporation of a new

Rupee symbol र

Software tools for typing languages

Language in which a computer programmer writes instructions for a computer to execute is
known as software.
NiLa, Varamozhi, Lipikaar are examples for Malayalam typing software. Lipikaar is a
simple method for typing in Malayalam on an ordinary keyboard. It requires no learning, and
within a few seconds you will be able to type in Malayalam any word that you can imagine. It
works on all Windows Applications, MS Office, All Websites, Chat and E-mail.
Typing software is different from Transliteration software that is found in Gmail.
Transliteration is a method in which you spell the pronunciation of the Malayalam word in
English. The algorithm then converts the word into Malayalam script.
There are several problems with this approach:
1. Ambiguous
Transliteration is suitable for common words that can be spelled easily. However for words
that are not part of our everyday conversation, figuring out the correct English spelling may
not be as simple. Typing words accurately may require a trial and error approach and thus
making it unsuitable for professional use.
2. Fluency in English
Transliteration requires users to have fluency in English so that they can spell the Malayalam
word phonetically.
3. Silent Characters
There are many silent characters in languages like Malayalam, Tamil and other Indic
scripts which may have different spellings but they are phonetically quite different. For an
intelligent transliteration algorithm, it becomes difficult to interpret these words.
4. Writing Names, Addresses and other non-dictionary words
Since transliteration is based on a dictionary approach typing names, addresses and other
non-dictionary or hybrid words becomes difficult.
Transliteration is more suitable for users who think in English and is meant for typing common
words and few sentences.

Using computers in the local language

Many organizations have been working to make the computers available in the different
Indian languages. However, because of multiplicity of the languages (there are 18 languages
recognized by the Constitution of India) the issue is quite complicated. There are only two
essential components required to represent a language on computers - The language must have a
Script, and it should be possible to represent the script on the computers. The computers
understand English because they were developed by people who used English.

In the Bharatbhasha system, one can use computers in Indian languages without paying the
extra cost for the hardware and software. Computer applications in Indian languages can be
prepared by the people who know computer programming, therefore, this part of the work has to
be taken up by those who know computers and also know Indian languages.
Nowadays there are virtual or “on-screen” keyboard that lets you type directly in your local
language script in an easy and consistent manner, no matter where you are or what computer
you’re using. Malayalam is also included

Language localization

Language localization(from the English term locale, "a place where something happens or
is set") is the second phase of a larger process of product translation and cultural adaptation (for
specific countries, regions, or groups) to account for differences in distinct markets, a process
known as internationalization and localization. Language localization is not merely a translation
activity, because it involves a comprehensive study of the target culture in order to correctly adapt
the product to local needs. Localization is sometimes referred to by the numeronym "L10N" (as in:
"L", followed by ten more letters, and then "N").
The localization process is most generally related to the cultural adaptation and translation
of software, video games, and websites, and less frequently to any written translation (which may
also involve cultural adaptation processes). Localization can be done for regions or countries
where people speak different languages, or where the same language is spoken: for instance,
different dialects of Spanish, with different idioms, are spoken in Spain than are spoken in Latin
America; likewise, word choices and idioms vary among countries where English is the official
language (e.g., in the United States, the United Kingdom, and the Philippines).
Keyboards that are sourced from the Gulf incorporate Arabic letters above each key.
Regional languages in India are also being introduced into keyboards as part of localization.

IT AND REGIONAL LANGUAGES

In a large, geographically dispersed, demographic multilingual country like India, the
common thread in implementing and achieving these basic objectives of governance has been the
development and adoption of language computing tools and methodologies. The government
officials in various provinces, the non-government functionaries across the country and the people,
mostly use their own language in day to day work, be it in government administration at various
levels, in business, in profession, in services or in school education. Thus, if the fruits of
information technology revolution have to spread to all these participating members, in
government and public, it is best done through the use of computers in their own languages.
The Center for Development of Advanced Computing (C-DAC) has made pioneering
contributions in developing Indian language tools with natural language processing, and in
evolving script and font standards through its GIST technology, to enable and spread use of
computers in various languages. It accordingly took up the initiative of developing important
e-governance solutions in Indian languages, which impact government and the citizens both. This
initiative started in 1997 and has grown to a significant extent by the end of 2001.

What is UNICODE

Fundamentally computers just deal with numbers. They store letters and other characters
by assigning a number for each one. Before Unicode was invented, there were hundreds of
different encoding systems for assigning these numbers. No single encoding could contain enough
characters: for example, the European Union alone requires several different encodings to cover all
its languages. Even for a single language like English no single encoding was adequate for all the
letters, punctuations and technical symbols in common use.
The name 'Unicode' is intended to suggest a unique, unified, universal encoding.
These encoding systems also conflict with one another. That is two encodings can use the
same number for two different characters or use different numbers for the same character. Any
given computer needs to support many different encodings; yet whenever data is passed between
different encodings or platforms, that data always runs the risk of corruption.
UNICODE standard is the universal character encoding standard, used for representation of
text for computer processing. UNICODE standard provides the capacity to encode all of the
characters used for the written languages of the world. The UNICODE standards provide
information about the character and their use. UNICODE standards are very useful for computer
users who deal with multilingual text, business people, linguists, researchers, scientists,
mathematicians, and technicians. UNICODE uses a 16 bit encoding that provides code point for
more than 65000 characters ( 65536). UNICODE standards assigns each character a unique
numeric value and name.
UNICODE provides a unique number for every character irrespective of the platform or the
program or the language. The UNICODE standard has been adopted by such industry leaders as
Apple, HP, IBM, JustSystems, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others.
Unicode's success at unifying character sets has led to its widespread and predominant use in the
internationalization and localization of computer software. The standard has been implemented in
many recent technologies, including XML, the Java programming language, the Microsoft .NET
Framework, and modern operating systems. It is supported in many operating systems and in all
modern browsers. The emergence of the Unicode Standard, and the availability of tools supporting
it is among the most significant recent global software technology trends.
Incorporating Unicode into client-server or multi-tiered applications and web sites offers
significant cost savings over the use of legacy character sets. Unicode enables a single software
product or a single website to be targeted across multiple platforms, languages and countries
without re-engineering. It allows data to be transported through many different systems without
corruption.

The Unicode consortium is a nonprofit organization founded to develop, extend and
promote use of the Unicode Standard, which specifies the representation of texts in modern
software products and standards. The membership of the consortium represents a broad spectrum
of corporations and organizations in the computer processing industry. The consortium is
supported financially solely through membership dues.

GREEN COMPUTING

Green computing is the study and practice of using computing resources efficiently. Some of the
key approaches that are followed as part of green computing are:
Efficient algorithm: A large algorithm would obviously require large memory space and more
time for execution; thus in a way leading to energy and resource wastage. Thus it is always
advisable to keep the algorithm efficient form the point of view of time and space tradeoff.

Virtualization of computer system: It involves creating multiple virtual computer systems
serving each of its own individualistic functions on a single physical hardware system. The entire
virtualization concept is based on the approach of ‘optimum utilization of available resources’.
Power management: It involves managing the power in such a way as to minimize its wastage.
This approach makes use of software controlled power management applications to eliminate
power wastage. Some of its examples includes, screen savers, automatic standby etc.
Power generation:
All computers require electrical power to operate. One of the goals of green computing is to use
power generated from sources that are more environmentally friendly than coal-fired power
stations
Recycling e-waste: This approach focuses on recycling of e-waste (old, broken or useless
electrical or electronic devices).