Greek Version            
Language Technology

The EUROMAP Project

Events

Press Release

  Logonavigation

  NEWS

  Contact

  Registration

  Links

 

 


The power of voice

The telephone is the most popular medium of communication, but not only between humans. Voice technologies allow us to… "replace" the transmitter or the receiver with automatic systems in providing services, information or even entertainment.

By Nick Makrymanolakis, journalist, nmakry at freemail.gr

Lets consider the following hypothetical but extremely realistic scenario: An educational company is advertised on the evening prime time zone, presenting its new educational programs. Those who see the advertisement and try to call the company immediately will get disappointed, because the company's call center operates only during morning hours. The next morning the company will receive calls, but only from those who retained its telephone number, still remember the advertisement, and manage to reach an agent from the company's call center.
The company of our example could have been contacted by considerably more prospective customers, without having agents at its call center 24 hours a day. With the use of appropriate applications and voice technologies, such as voice recognition or even speech synthesis, the company could operate an automatic call center, communicating with customers all day, without losing a single call. All these are translated as benefits for the company, direct and indirect, improving its image to the customers.

Voice: A real opportunity for information provision

Nowadays, we all need information. There is plenty of information on computer networks and primarily on the Internet. However, it may be quite difficult to be in front of a computer at the exact time we need some information. And although information may be provided via telephone to everyone at anytime, having human agents to dispatch it costs a lot.
The evolution of technology allows, to a large extent, for the automation of verbal communication. Thus, the user asks for the information he/she wants, using his/her voice, but at the other end of the line there isn't always a human being, since an automatic system has the ability to reply.

All these functions make up the Interactive Voice Response (IVR) systems, which exploit technologies such as voice recognition, speech synthesis or composition of prerecorded messages along with speech synthesis.

There are multiple benefits for the companies using such systems, not to mention the high Return On Investment (ROI). Indicatively, with a modern IVR system a company achieves:

* Provision of support or information to its customers on a 24x7 basis.
* Diminished call center cost, since a significant number of calls are handled by the automatic system and not by humans.
* New communication channels with its customers.

Besides, we shouldn't forget, that voice is the most natural, expedient and fast medium of communication. No special dexterities are required to receive verbal information, in opposition to the use of computers, which requires specific skills. Today, products and applications are not judged only by their functionality, but also by their user-friendliness. In this respect, voice has undeniably a big advantage…


It is, indeed, easy

Voice technology applications are mature enough to allow for the fast development of automatic information systems. During the recent national elections in October 2002 in Greece, the information system provided by DELTA-Singular allowed citizens to call a premium charge phone number, declaring verbally the Prefecture or the Municipality they're interested in and get an immediate answer with the current result. The whole system was set up in a short period of time, with the assistance of Dienekis Informatics.

Mr. Nikolaos Skouras, Business Services Director in the Services Outsourcing Division of DELTA-Singular says that between the two voice recognition systems from Nuance and Philips available for the Greek language, they selected the first, because it handled voice via telephone better. From the three speech synthesis systems available in the Greek market, they selected the system "Aesop" from Aristotelian University of Thessalonica, because Dienekis Informatics was already familiar with that. The system operated at the elections as a pilot installation. According to Mr. Skouras: "DELTA-Singular will use this specific know-how for developing other applications for information provision on credit cards' limits etc."

Since each project is constituted by independent applications, a different application can be used in each installation . For example, the available speech synthesis tools in Greek, beyond "Aesop", are the "Ekfonitis" text-to-speech tool by the Institute for Language and Speech Processing (ILSP / R.C. "Athena") and "Demosthenes" by the University of Athens. These programs are under continuous development in order for the speech quality to be closer to human. Mr. Skouras pointed out the efforts that they put in order to improve "Aesop" in certain idiomatic instances of the Greek language, while the newer version of the program, offering female voice as well, is expected to sound even more natural.

The advantages of female voice in speech synthesis were also pointed out by Dr. Stelios Bakamidis, Head of ILSP / R.C. "Athena"'s Speech Technology Department, who presented the female voice in the latest version of "Ekfonitis". The tool is available on the market on a CD-ROM, so it is easy for anyone to test its quality speech capabilities. Those who have followed its development will observe how much it has been improving from version to version, allowing the fast development of quality applications.

"Open sesame"

In the summer of 2000, Cosmote, the biggest mobile telephony company in Greece, launched "MyCosmos", a voice portal where navigation was possible exclusively through voice. The project idea belonged to "Dialogos", the Greek representative of Nuance, specialized in voice applications. Dialogos was interested in a collaboration with a big company in order to promote its integrated voice technology application.

The development started in October 1999 and the first service that MyCosmos provided, the summer of 2000, was information about stocks listed in Athens Stock Exchange, with the use of voice recognition for the names of listed companies. Progressively the portal was enriched with other services, which allowed for less restricted questions. For example, the cinema service allows questions like: "Where can I watch the 'Gladiator' in [name of area]", that can be submitted in a lot of variants. Also, for the well known card game Black Jack, the system would be supposed to understand each likely reaction of the player, like: "You are cheating", "This is not acceptable" and so on.

"Each service has its own particularities during the development phase. As the number of services offered are increasing, the cost for developing new ones decreases", says Mr. Aggelos Polatos, Product Expert in Cosmote. By providing new innovative services, the number of users who call MyCosmos increases as well. At the end of 2001, there were 5.000 users daily, while it is important to note that some of them are older people, not familiar with technology and access to information.

"One of our basic objectives was to boost the company into new sectors. We didn't invest purely with economic criteria, but we also calculated the indirect benefits for our corporate image", says Mr. Polatos. The company is currently looking at how to exploit that know-how in other sectors, including the provision of content in other languages, especially in the framework of the 2004 Olympic Games in Greece. Moreover, the possibility of using voice synthesis instead of prerecorded messages in cases where the latter becomes disadvantageous, is being considered.


Interest for investment activity

The example of a new Greek company that began its operation in early 2001 practicing voice technologies, is characteristic of the prospects of the area, and also how this sector is seen by the investors. VoiceWeb was started by a team of scientists who were initially supported by a Greek incubator, InQLab, while later on SETE Ventures LP - a venture capital of Latsis' Financial Organization - participated in its capital.

From the beginning VoiceWeb cooperated with a great number of companies, working towards the creation of an integrated platform providing voice technology solutions. "It is very important for us to know all available technologies, in order to exploit the best ones for each application. For instance, in an application for mobile telephony we would use the Nuance voice recognition tool, which has been trained in a relevant environment, while for the conventional telephones we would prefer the Philips voice recognition technology ", says Dr. Nikos Patsis, VoiceWeb CEO.

The Greek market is less aware of the importance of customer support, compared to other European countries. "There is an innate difficulty in discussing our proposed solution and its cost with a customer, when he is not aware of the cost scale of such a market", says Dr. Patsis, while he emphasizes that: "technology is not an end in itself. Although it is possible to automate several procedures, it is not profitable to work for months on an application to be used only 10 times a year."

VoiceWeb has developed several applications exploiting voice technologies. Indicatively, we could mention the "Sports Bet" and the game "Blackjack", where voice recognition, in combination with sporadic prerecorded messages, is used for the system response. At the request of Vizzavi Greece, VoiceWeb has developed the company's voice portal providing diverse services, while maintaining its own voice portal where information on weather, sports, cinema, stocks etc. is available via telephone with just a regular call cost (telephone number: +30-210-8108000).

VoiceWeb's plans include the development of an application that will connect the personal email and contacts from Microsoft Outlook with the telephone. Thus, using speech synthesis technologies, one will be able to listen to emails via telephone and also compose, forward and otherwise manage emails. A similar application has worked out quite well in the USA by AOL, and it has almost 2.000.000 subscribers. To this end, the national (GSRT) project EFONO, undertaken by a consortium of ILSP / R.C. "Athena" and Panafon-Vodafone, has worked for the development of an experimental application to manage emails.


Technology maturity and prospects

The future of voice technologies will be determined by the quality of applications and the corporate needs for such applications. As Dr. Bakamidis points out, an important factor for the further growth of language technology, is the research programs financed by the European Commission and other national resources. The Institute for Language and Speech Processing has active involvement in several research programs for the improvement of existing technologies. To name some EU projects, ERMIS, IMUTUS and MUSA are devoted to a more sophisticated incorporation of language technologies in more complicated environments.

More specifically, the ERMIS project aim is to model the emotional state of a user, in order for the machine to respond adequately. IMUTUS offers the possibility of self learning a music instrument, as well as working on music recognition, while the MUSA project allows the automatic subtitling of films and news broadcasts. Two out of these three projects are coordinated by ILSP / R.C. "Athena".

Regarding speech synthesis, Dr. Patsis believes that: "The technology will be mature enough when we will be able to generate a whole news bulletin", defining the problem of speech synthesis systems as a lack of natural intonation, pitch and prosody. Of the same opinion is Mr. Skouras as well as Dr. Bakamidis who points out: "Technology has advanced considerably and we can "listen" to it in every new version of 'Ekfonitis' which is considerably better than the previous one".
Both Dr. Bakamidis and Dr. Patsis see a great prospect in the use of voice technologies in the public sector. The Greek State lacks on offering innovative and high quality services to the citizens, although the latest achievements and the projects either underway or proposed allow for a more hopeful view. "We see that State Agencies are very interested, mainly for information management, and this, indeed, is something very encouraging", says Dr. Patsis.

Very promising, moreover, is the interest indicated by the private sector, especially companies which support call centers. The companies, such as banks, operate these centers in order to further strengthen their provisions, however, they are aware that their operational costs could be decreased should they use voice technologies.

Indubitably, to talk about voice technologies some years ago, especially in Greece, would sound like science fiction due to the absence of Greek language technology tools equivalent to the ones that existed in other languages. Today, despite the fact that there is a lot to be done for the further improvement of the existing tools and applications, there is tangible proof of progress. Indeed, the European Union of 15 countries and 13 official languages relies upon systems that will provide real-time - with the smallest possible cost - multilingual e-content.

**********

Nick Makrymanolakis is a journalist with studies in information technology, marketing, public relations and advertising. Since 1993, he has monitored the area of technology applications in business, with emphasis on the information technology and the Internet. Today he is the editor-in-chief of the Greek business magazine New Economy Observer (www.neomag.gr).

**************

BOX1

Greek companies and agencies

* Aristotle University of Thessalonica - www.auth.gr
* Vizzavi, web portal - www.vizzavi.gr
* VoiceWeb, voice applications integrator - www.voiceweb.gr
* DELTA-Singular, software and IT services company - www.deltasingular.gr
* Dialogos, representative of Nuance in Greece- www.speech.gr
* Dienekis Informatics, IT services company - www.dienekis.gr
* Institute for Language and Speech Processing, research institute - www.ilsp.gr
* Cosmote, mobile telephony provider - www.Cosmote.gr
* Knowledge, representative of Philips voice applications - www.knowledge.gr
* University of Athens - www.uoa.gr

Foreign companies and agencies

* Dialogic, telephone cards maker - www.dialogic.com
* Envox, voice platforms - www.envox.com
* Nuance, voice applications - www.nuance.gr
* Philips, voice applications - www.Philips.com


EU voice technology related projects with Greek participation

* ACCeSS (Automated Call Center through Speech Understanding System) -
* BALKANET -
* CATCH-2004 (Converse in Athens-2004, Cologne, Helsinki) -
* DICOPRO (On-Line Dictionary Consultation For Language Professionals On Intranet) - issco-www.unige.ch/projects/dicopro_public/
* E-MATTER (E-Mail Access through the Telephone Using Speech Technology Resources) -
* ERMIS (Emotionally Rich Man-machine Interaction Systems) - www.image.ntua.gr/ermis
* GEMINI (Generic Environment for Multilingual Interactive Natural Interfaces) -
* HOPE (HLT Opportunity Promotion in Europe) -
* IDAS - (Interactive Telephone-based Directory Assistance Services) -
* IMUTUS (Interactive Music Tuition System) - www.exodus.gr/imutus
* M-PIRO (Multilingual Personalised Information Objects) - www.ltg.ed.ac.uk/mpiro
* MULTITRAIN (An Integrated Platform For Multimedia Skilled Workforce Enhancement By Providing Focused Training In Digital Content) -
* MUSA (Multilingual Subtitling of Multimedia Content) -
* ONOMASTICA (Multi-Language Pronunciation Dictionary of Proper Names) -
* ORIENTEL (Multilingual access to interactive communication services for the Mediterranean and the Middle East) - www.orientel.org
* SPEECHDAT (Speech Databases for Creation of Voice Driven Teleservices) - speechdat.phonetik.uni-muenchen.de

****************

Box 2

Glossary

* Interactive Voice Response (IVR): Technology used in voice triggered systems contacted via telephone. The user states his interest verbally or with the use of a telephone keypad.
* Speech Recognition: The automatic process where voice is recognised and handled.
* Speech Synthesis: The automatic process of turning electronic text to synthetic speech, based on phoneme manipulation.
* Phoneme: the smallest sound unit, which differentiates a sound in any spoken language or dialect.

******************

Quotes from interviews

"Technology is not an end in itself. Although it is possible to automate several procedures, it is not profitable to work for months on an application to be used only 10 times a year."

Dr. Nikolaos Patsis, CEO, VoiceWeb

---------

" One of our basic objectives was to boost the company into new sectors. We didn't invest purely with economic criteria, but we also calculated the indirect benefits for our corporate image"

Aggelos Polatos, Product Expert, Cosmote

-------

"Technology has advanced considerably and we can "listen" to it in every new version of 'Ekfonitis' which is considerably better than the previous ones".

Dr. Stelios Bakamidis, Head of Speech Technology Department in Institute for Language and Speech Processing