|
|
The power of voice
The telephone is the most popular medium of communication, but not only between
humans. Voice technologies allow us to… "replace" the transmitter or the
receiver with automatic systems in providing services, information or even
entertainment.
By Nick Makrymanolakis, journalist,
nmakry at freemail.gr
Lets consider the following hypothetical but extremely realistic scenario: An
educational company is advertised on the evening prime time zone, presenting its
new educational programs. Those who see the advertisement and try to call the
company immediately will get disappointed, because the company's call center
operates only during morning hours. The next morning the company will receive
calls, but only from those who retained its telephone number, still remember the
advertisement, and manage to reach an agent from the company's call center.
The company of our example could have been contacted by considerably more
prospective customers, without having agents at its call center 24 hours a day.
With the use of appropriate applications and voice technologies, such as voice
recognition or even speech synthesis, the company could operate an automatic
call center, communicating with customers all day, without losing a single call.
All these are translated as benefits for the company, direct and indirect,
improving its image to the customers.
Voice: A real opportunity for information provision
Nowadays, we all need information. There is plenty of information on computer
networks and primarily on the Internet. However, it may be quite difficult to be
in front of a computer at the exact time we need some information. And although
information may be provided via telephone to everyone at anytime, having human
agents to dispatch it costs a lot.
The evolution of technology allows, to a large extent, for the automation of
verbal communication. Thus, the user asks for the information he/she wants,
using his/her voice, but at the other end of the line there isn't always a human
being, since an automatic system has the ability to reply.
All these functions make up the Interactive Voice Response (IVR) systems, which
exploit technologies such as voice recognition, speech synthesis or composition
of prerecorded messages along with speech synthesis.
There are multiple benefits for the companies using such systems, not to mention
the high Return On Investment (ROI). Indicatively, with a modern IVR system a
company achieves:
* Provision of support or information to its customers on a 24x7 basis.
* Diminished call center cost, since a significant number of calls are handled
by the automatic system and not by humans.
* New communication channels with its customers.
Besides, we shouldn't forget, that voice is the most natural, expedient and fast
medium of communication. No special dexterities are required to receive verbal
information, in opposition to the use of computers, which requires specific
skills. Today, products and applications are not judged only by their
functionality, but also by their user-friendliness. In this respect, voice has
undeniably a big advantage…
It is, indeed, easy
Voice technology applications are mature enough to allow for the fast
development of automatic information systems. During the recent national
elections in October 2002 in Greece, the information system provided by DELTA-Singular
allowed citizens to call a premium charge phone number, declaring verbally the
Prefecture or the Municipality they're interested in and get an immediate answer
with the current result. The whole system was set up in a short period of time,
with the assistance of Dienekis Informatics.
Mr. Nikolaos Skouras, Business Services Director in the Services Outsourcing
Division of DELTA-Singular says that between the two voice recognition systems
from Nuance and Philips available for the Greek language, they selected the
first, because it handled voice via telephone better. From the three speech
synthesis systems available in the Greek market, they selected the system "Aesop"
from Aristotelian University of Thessalonica, because Dienekis Informatics was
already familiar with that. The system operated at the elections as a pilot
installation. According to Mr. Skouras: "DELTA-Singular will use this specific
know-how for developing other applications for information provision on credit
cards' limits etc."
Since each project is constituted by independent applications, a different
application can be used in each installation . For example, the available speech
synthesis tools in Greek, beyond "Aesop", are the "Ekfonitis" text-to-speech
tool by the Institute for Language and Speech Processing (ILSP / R.C. "Athena") and "Demosthenes"
by the University of Athens. These programs are under continuous development in
order for the speech quality to be closer to human. Mr. Skouras pointed out the
efforts that they put in order to improve "Aesop" in certain idiomatic instances
of the Greek language, while the newer version of the program, offering female
voice as well, is expected to sound even more natural.
The advantages of female voice in speech synthesis were also pointed out by Dr.
Stelios Bakamidis, Head of ILSP / R.C. "Athena"'s Speech Technology Department, who presented
the female voice in the latest version of "Ekfonitis". The tool is available on
the market on a CD-ROM, so it is easy for anyone to test its quality speech
capabilities. Those who have followed its development will observe how much it
has been improving from version to version, allowing the fast development of
quality applications.
"Open sesame"
In the summer of 2000, Cosmote, the biggest mobile telephony company in Greece,
launched "MyCosmos", a voice portal where navigation was possible exclusively
through voice. The project idea belonged to "Dialogos", the Greek representative
of Nuance, specialized in voice applications. Dialogos was interested in a
collaboration with a big company in order to promote its integrated voice
technology application.
The development started in October 1999 and the first service that MyCosmos
provided, the summer of 2000, was information about stocks listed in Athens
Stock Exchange, with the use of voice recognition for the names of listed
companies. Progressively the portal was enriched with other services, which
allowed for less restricted questions. For example, the cinema service allows
questions like: "Where can I watch the 'Gladiator' in [name of area]", that can
be submitted in a lot of variants. Also, for the well known card game Black Jack,
the system would be supposed to understand each likely reaction of the player,
like: "You are cheating", "This is not acceptable" and so on.
"Each service has its own particularities during the development phase. As the
number of services offered are increasing, the cost for developing new ones
decreases", says Mr. Aggelos Polatos, Product Expert in Cosmote. By providing
new innovative services, the number of users who call MyCosmos increases as well.
At the end of 2001, there were 5.000 users daily, while it is important to note
that some of them are older people, not familiar with technology and access to
information.
"One of our basic objectives was to boost the company into new sectors. We
didn't invest purely with economic criteria, but we also calculated the indirect
benefits for our corporate image", says Mr. Polatos. The company is currently
looking at how to exploit that know-how in other sectors, including the
provision of content in other languages, especially in the framework of the 2004
Olympic Games in Greece. Moreover, the possibility of using voice synthesis
instead of prerecorded messages in cases where the latter becomes
disadvantageous, is being considered.
Interest for investment activity
The example of a new Greek company that began its operation in early 2001
practicing voice technologies, is characteristic of the prospects of the area,
and also how this sector is seen by the investors. VoiceWeb was started by a
team of scientists who were initially supported by a Greek incubator, InQLab,
while later on SETE Ventures LP - a venture capital of Latsis' Financial
Organization - participated in its capital.
From the beginning VoiceWeb cooperated with a great number of companies, working
towards the creation of an integrated platform providing voice technology
solutions. "It is very important for us to know all available technologies, in
order to exploit the best ones for each application. For instance, in an
application for mobile telephony we would use the Nuance voice recognition tool,
which has been trained in a relevant environment, while for the conventional
telephones we would prefer the Philips voice recognition technology ", says Dr.
Nikos Patsis, VoiceWeb CEO.
The Greek market is less aware of the importance of customer support, compared
to other European countries. "There is an innate difficulty in discussing our
proposed solution and its cost with a customer, when he is not aware of the cost
scale of such a market", says Dr. Patsis, while he emphasizes that: "technology
is not an end in itself. Although it is possible to automate several procedures,
it is not profitable to work for months on an application to be used only 10
times a year."
VoiceWeb has developed several applications exploiting voice technologies.
Indicatively, we could mention the "Sports Bet" and the game "Blackjack", where
voice recognition, in combination with sporadic prerecorded messages, is used
for the system response. At the request of Vizzavi Greece, VoiceWeb has
developed the company's voice portal providing diverse services, while
maintaining its own voice portal where information on weather, sports, cinema,
stocks etc. is available via telephone with just a regular call cost (telephone
number: +30-210-8108000).
VoiceWeb's plans include the development of an application that will connect the
personal email and contacts from Microsoft Outlook with the telephone. Thus,
using speech synthesis technologies, one will be able to listen to emails via
telephone and also compose, forward and otherwise manage emails. A similar
application has worked out quite well in the USA by AOL, and it has almost
2.000.000 subscribers. To this end, the national (GSRT) project EFONO,
undertaken by a consortium of ILSP / R.C. "Athena" and Panafon-Vodafone, has worked for the
development of an experimental application to manage emails.
Technology maturity and prospects
The future of voice technologies will be determined by the quality of
applications and the corporate needs for such applications. As Dr. Bakamidis
points out, an important factor for the further growth of language technology,
is the research programs financed by the European Commission and other national
resources. The Institute for Language and Speech Processing has active
involvement in several research programs for the improvement of existing
technologies. To name some EU projects, ERMIS, IMUTUS and MUSA are devoted to a
more sophisticated incorporation of language technologies in more complicated
environments.
More specifically, the ERMIS project aim is to model the emotional state of a
user, in order for the machine to respond adequately. IMUTUS offers the
possibility of self learning a music instrument, as well as working on music
recognition, while the MUSA project allows the automatic subtitling of films and
news broadcasts. Two out of these three projects are coordinated by ILSP / R.C. "Athena".
Regarding speech synthesis, Dr. Patsis believes that: "The technology will be
mature enough when we will be able to generate a whole news bulletin", defining
the problem of speech synthesis systems as a lack of natural intonation, pitch
and prosody. Of the same opinion is Mr. Skouras as well as Dr. Bakamidis who
points out: "Technology has advanced considerably and we can "listen" to it in
every new version of 'Ekfonitis' which is considerably better than the previous
one".
Both Dr. Bakamidis and Dr. Patsis see a great prospect in the use of voice
technologies in the public sector. The Greek State lacks on offering innovative
and high quality services to the citizens, although the latest achievements and
the projects either underway or proposed allow for a more hopeful view. "We see
that State Agencies are very interested, mainly for information management, and
this, indeed, is something very encouraging", says Dr. Patsis.
Very promising, moreover, is the interest indicated by the private sector,
especially companies which support call centers. The companies, such as banks,
operate these centers in order to further strengthen their provisions, however,
they are aware that their operational costs could be decreased should they use
voice technologies.
Indubitably, to talk about voice technologies some years ago, especially in
Greece, would sound like science fiction due to the absence of Greek language
technology tools equivalent to the ones that existed in other languages. Today,
despite the fact that there is a lot to be done for the further improvement of
the existing tools and applications, there is tangible proof of progress. Indeed,
the European Union of 15 countries and 13 official languages relies upon systems
that will provide real-time - with the smallest possible cost - multilingual e-content.
**********
Nick Makrymanolakis is a journalist with studies in information technology,
marketing, public relations and advertising. Since 1993, he has monitored the
area of technology applications in business, with emphasis on the information
technology and the Internet. Today he is the editor-in-chief of the Greek
business magazine New Economy Observer (www.neomag.gr).
**************
BOX1
Greek companies and agencies
* Aristotle University of Thessalonica -
www.auth.gr
* Vizzavi, web portal - www.vizzavi.gr
* VoiceWeb, voice applications integrator -
www.voiceweb.gr
* DELTA-Singular, software and IT services company -
www.deltasingular.gr
* Dialogos, representative of Nuance in Greece-
www.speech.gr
* Dienekis Informatics, IT services company -
www.dienekis.gr
* Institute for Language and Speech Processing, research institute -
www.ilsp.gr
* Cosmote, mobile telephony provider -
www.Cosmote.gr
* Knowledge, representative of Philips voice applications -
www.knowledge.gr
* University of Athens - www.uoa.gr
Foreign companies and agencies
* Dialogic, telephone cards maker -
www.dialogic.com
* Envox, voice platforms - www.envox.com
* Nuance, voice applications - www.nuance.gr
* Philips, voice applications - www.Philips.com
EU voice technology related projects with Greek participation
* ACCeSS (Automated Call Center through Speech Understanding System) -
* BALKANET -
* CATCH-2004 (Converse in Athens-2004, Cologne, Helsinki) -
* DICOPRO (On-Line Dictionary Consultation For Language Professionals On
Intranet) -
issco-www.unige.ch/projects/dicopro_public/
* E-MATTER (E-Mail Access through the Telephone Using Speech Technology
Resources) -
* ERMIS (Emotionally Rich Man-machine Interaction Systems) -
www.image.ntua.gr/ermis
* GEMINI (Generic Environment for Multilingual Interactive Natural Interfaces) -
* HOPE (HLT Opportunity Promotion in Europe) -
* IDAS - (Interactive Telephone-based Directory Assistance Services) -
* IMUTUS (Interactive Music Tuition System) -
www.exodus.gr/imutus
* M-PIRO (Multilingual Personalised Information Objects) -
www.ltg.ed.ac.uk/mpiro
* MULTITRAIN (An Integrated Platform For Multimedia Skilled Workforce
Enhancement By Providing Focused Training In Digital Content) -
* MUSA (Multilingual Subtitling of Multimedia Content) -
* ONOMASTICA (Multi-Language Pronunciation Dictionary of Proper Names) -
* ORIENTEL (Multilingual access to interactive communication services for the
Mediterranean and the Middle East) -
www.orientel.org
* SPEECHDAT (Speech Databases for Creation of Voice Driven Teleservices) - speechdat.phonetik.uni-muenchen.de
****************
Box 2
Glossary
* Interactive Voice Response (IVR): Technology used in voice triggered systems
contacted via telephone. The user states his interest verbally or with the use
of a telephone keypad.
* Speech Recognition: The automatic process where voice is recognised and
handled.
* Speech Synthesis: The automatic process of turning electronic text to
synthetic speech, based on phoneme manipulation.
* Phoneme: the smallest sound unit, which differentiates a sound in any spoken
language or dialect.
******************
Quotes from interviews
"Technology is not an end in itself. Although it is possible to automate several
procedures, it is not profitable to work for months on an application to be used
only 10 times a year."
Dr. Nikolaos Patsis, CEO, VoiceWeb
---------
" One of our basic objectives was to boost the company into new sectors. We
didn't invest purely with economic criteria, but we also calculated the indirect
benefits for our corporate image"
Aggelos Polatos, Product Expert, Cosmote
-------
"Technology has advanced considerably and we can "listen" to it in every new
version of 'Ekfonitis' which is considerably better than the previous ones".
Dr. Stelios Bakamidis, Head of Speech Technology Department in Institute for
Language and Speech Processing
|
|