News

It's getting easier to add audio to ATMs

The bad news: Triple DES isn't the only regulation impacting ATMs in the near future. The good news: technology advances are making it easier to add audio to ATMs.

August 20, 2003

Summarize with

With much of the ATM world preoccupied with upgrading its machines and infrastructure to comply with Triple DES mandates by 2005, some have lost sight of another regulatory requirement designed to help visually impaired users of ATMs.

The Department of Justice is currently reviewing the federal Access Board's list of recommendations to make ATMs more accessible, including a requirement for audio capability. (See related story ATM industry welcomes new ADA draft)

At an ATM Channel Planning seminar sponsored by NCR earlier this summer, John Wodatch, an attorney with the DOJ's Civil Rights Division, predicted the DOJ will produce its proposal, which will determine when and how the Access Board's recommendations are implemented, in early 2004.

It will likely take another year or so for the proposal to become part of the Americans with Disabilities Act (ADA), he said.

Dean Stewart, director of product development for Diebold, said some ATM owners are cutting down on visits to machines by implementing audio and Triple DES capabilities at the same time. This approach is cheered by Tim Hoyle, a senior consultant with IRB Consulting.

"(ATM owners) shouldn't be dragging their heels on this," said Hoyle, whose Millsboro, Del.-based consulting firm is assisting Sovereign Bank with its audio ATM program.

This story and all the great Free content on ATMmarketplace.com is supported by IRB Consulting Group Inc..
Providing practical solutions to your business challenges.

-----------------------------Interested in joining the team of supporters at ATMmarketplace.com?
Click Here for details!

Hoyle doesn't believe the DOJ will give deployers a large window of opportunity to implement audio at ATMs. "This issue has been out there a long time, and manufacturers have proven they have the technology," he said.

The technology to make ATMs "talk" is readily available. It generally involves adding software, an audio jack and sound cards to ATMs. In some cases, a faster processor and more memory are needed to support WAV files or text-to-speech engines, the two methods used to add audio to ATMs.

Hoyle said Sovereign's experiences have included everything from minimal upgrades to "replacing the entire guts of the machine," primarily because of the diverse nature of its ATM fleet after several acquisitions.

Sovereign has rolled out 145 audio-enabled ATMs since last December and plans to add another 200 or so this year, Hoyle said. Some banks began introducing audio-enabled ATMs as early as 2000. (See related story Vision for the future)

"I think that smaller institutions will be able to leverage a lot of the footwork that's being done now," Hoyle said.

Easier does it

Most industry watchers seem to agree that technology improvements -- particularly a combination of Windows-based ATMs and text-to-speech software -- are making the process easier. Text-to-speech software generally uses a computer-generated synthetic "voice" rather than a human one.

Wells Fargo first introduced audio using WAV files in 2000, said Jonathan Velline, senior vice president of ATM banking. The bank switched to a text-to-speech engine when it began adding Windows-based ATMs in 2001. Audio is now available on some 3,800 Wells Fargo ATMs -- about 70 percent of the bank's network.

Velline said text-to-speech software affords greater flexibility to provide dynamic content such as account balances. It also eliminates the need for live "talent." Finding and recording a human voice to produce WAV files "adds another level of complexity" to the process, he said.

"Talent" is also more expensive than synthesized text-to-speech -- although Wells defrayed some of those costs by employing a member of its own marketing department. His voice can still be heard on the 100 to 200 older ATMs that still use WAV files for audio, Velline said.

Text-to-speech is generally not an option in an OS/2 environment. "WAV files are the best solution for an OS/2 platform," said Diebold's Stewart. "While we were able to do text-to-speech under OS/2, it was difficult to get speech that was understandable."

The human touch

In addition to its compatibility with an OS/2 platform, the primary advantage of WAV files is the warmer, more natural quality of the voice.

Some text-to-speech engines sound more natural than others. Chris Spencer, chief executive of Wizzard Software, a reseller of both AT&T's Natural Voices and IBM's text-to-speech software, said that AT&T's product, which produces speech by concatenating (or linking) sounds from a prerecorded database, sounds more "human." However, files created this way are larger and require far more memory.

Another option, said Kevin Carroll, director of ATM products for Concord EFS, is using text-to-speech software to generate WAV files. The result is a more natural sounding voice that can run in an OS/2 environment -- but with more flexibility and at less cost than hiring human "talent."

Concord can develop a "script" of ATM transactions for its clients, Carroll said. After the client signs off on the script, Concord runs it through a text-to-speech engine and produces a CD of the WAV files. The client reviews the CD in either their own test lab or Concord's and makes any necessary changes before a final version is produced.

Diebold's Stewart said more time and effort may be required for script development when a text-to-speech engine is used. Many words must be spelled out phonetically, for instance, to ensure a correct pronunciation.

Stewart suggested that deployers may want to use WAV files for relatively static ATM content like the attract loop and employ text-to-speech for screen sequences where changes are more likely to occur.

Size matters

Unlike text-to-speech files, which in some cases are generated by software resident at the ATM, WAV files must be manually downloaded. They are much larger than text-to-speech files, and new WAV files must be produced any time an ATM function that requires a new screen is added.

"The larger the files, the longer (a download) is going to take," Carroll said, noting that most deployers minimize ATM downtime by scheduling these loads during early hours of the morning and other slow times.

"Where a text-to-speech file that says 'please enter your PIN' would take 20 bytes, a WAV file saying the same thing might take 20 kilobytes," said Bill Jackson, Triton'schief technical officer.

Triton is using text-to-speech software for its PC-based models like the 9800, RL5000 and FT5000, as well those with embedded processors like the 9100 and 9700. Adding audio to non-PC machines is more costly -- probably "a few hundred dollars" -- because a hardware change is required, said Jackson, who has authored a white paper on the issue.

Wells Fargo found that the visually impaired members of its focus groups expressed no preference for one type of software over another. "For them, it's the content rather than the sound of the voice itself that's important," Velline said.