Back to 9/96 Features: Surf's Up
Up to Table of Contents
Ahead to 9/96 Reviews What's Hot: NEC SuperScript 860 printer

9/96 Head to Head: Speech-Recognition Software

Listing of September 1996 Reviews

Friends, Romans, PCs: Lend Me Your Ears

  • Coming Soon to a PC Near You

    By James E. Powell, Northwest Bureau Editor

    Back in the Stone Age, you dictated a memo to your secretary. Today, you can dictate it to your word processor, thanks to speech-recognition software. Although they're often classed as aids to the disabled user, or as tools available only to narrowly defined professions, these speech-to-text products are about to hit the mainstream.

    I examined the latest products from two category leaders, Kurzweil's Voice 2.0 and IBM's VoiceType Dictation 3.0 for Windows 95, and found they've gone from being yesterday's playthings to today's everyday tool. Both make it possible to "speak" your text more quickly and-most important-more accurately than you can input it on a keyboard. I type more than 70 words per minute but I can dictate much faster, without worrying about spelling errors. These two products are stunningly accurate and, frankly, a lot of fun to use.

    Each product supplies a microphone headset that you plug into your sound card. You install the software, strap on the headset and begin talking. While both products can be used out of the box, the best results come after you "enroll" the system, training it to recognize your own peculiarities of speech. After enrollment, the software will compile your speaker profile, which took about an hour on my AST Pentium 200. But training doesn't stop there; the programs continue to learn as you dictate.

    Kurzweil Voice starts its enrollment session by asking you for a volume check. It tests for background noise and voice level, then has you pronounce a series of 33 numbers and 400 simple words that emphasize a variety of hard-to-catch sounds such as "p" and "d." I spent about 35 minutes at it, and by the 100th word I was growing weary.

    VoiceType asks you to speak sentences, a more natural process, and you can choose to enroll with different levels of intensity. Of course, the more you speak, the more accurate the results. During the last stages, you're reading a ghost story by Mark Twain. It's certainly more interesting than reciting words and numbers, but a really thorough training job with VoiceType takes about 90 minutes.

    Both systems offer two dictation styles, depending on what you're doing. You give system commands using continuous dictation, without pauses between words. To put text on the screen, you need to practice "discrete" dictation, that is, adding a sharp, short pause between each word-"as.if.they.were.separated.by.periods"-which gives the software definite starting and stopping points. The documentation insists that you need only about a one-sixth-second pause between your words; in my tests, more like a half-second pause was required.

    Once I'd enrolled them, I tested both programs by dictating a variety of documents into Microsoft Word, using the microphone in each package and the same computer and Sound Blaster-compatible sound system.

    As you speak into the microphone, Kurzweil Voice displays your words in a small window. It offers numbered suggestions for sounds it doesn't understand, and you choose the correct word from the list by saying, for instance, "Take 2" to accept the second choice. If the program can't come up with the proper word, you can type it in yourself. Either way, the software will store the sounds and word for future reference, in effect training itself.

    At first, it's tempting to simply type the correct word into your word processor and go on, but this bypasses the ongoing (re)training portion of the program. Instead you must tell Voice to "delete that," watch it erase the word, and then start over.

    There are dozens of commands to simulate keys or move around an application, too: "Enter key," "down arrow" and "page down" do just what you'd expect. You can move to the next word or page, to the beginning of the paragraph, or to the top of the document. "Open Word" started Word and opened a new document. I didn't even have to tell Voice where Word was located. You can toggle the microphone on and off by pressing Ctrl+Alt or clicking on the microphone icon, or by saying "Stop listening." Just say "Listen to me" to resume dictation.

    IBM VoiceType's installation adds two buttons to the Word toolbar plus a Dictation menu. The program works with "trigrams," groups of three words, rather than individual words, which gives it the ability to do limited context checking. For example, it can select the correct homonym (two, to, too) for "far too much." But this can be disconcerting, especially when the program goes back to correct something you spoke two or three words ago.

    VoiceType also lets you go back and retrain your system on any of the last 20 words you've spoken. There is no pop-up window of alternatives as you dictate, so when you want to make a change, you have to select a menu option or right-click on the word. VoiceType stops dictation mode and plays back your voice for the selected word; you choose from a list of alternatives (or type the word yourself) and then resume dictating. It's a more time-consuming process, and since it isn't voice-controlled, you must frequently switch between voice and keyboard.

    Although VoiceType isn't as versatile out of the box as Kurzweil Voice, many of its navigation commands are extremely useful. There's a single command to move to a previous period, for example, and the program's printed Quick Reference is very helpful.

    Kurzweil Voice made me speak each word softly and in a choppy fashion so I could edit any mistakes; VoiceType let me speak faster and in a more normal mode, but I kept looking back to make sure words were correct.

    Kurzweil Voice is more sensitive to room noise. When I started to type a filename, the microphone picked up the keyboard clicks and inserted numbers and dashes into the filename. Ironically, when my fax machine started up, there was no problem with the noise it generated-until the final beep, which Voice interpreted as "history" and "kept."

    Both programs are very, very smart. Although they didn't always score 100 percent (for some reason, each program had trouble learning my name: "Powell"), once I'd trained either I'd never see that particular mistake again. Some of the first-time errors provided comic relief; "complex" came out as "comes plague," while "I think" showed up as "icing" and "Viking." Each program missed the same words, but made up for it with surprisingly large, customizable dictionaries to hold corrections.

    Each lets you create voice macros, in which a spoken phrase triggers a series of actions. VoiceType offers templates you can use to create pauses for data entry. Although the program will display the macro name when you pronounce that name, it won't substitute the actual keystrokes until you dictate something else.

    With both programs, you speak the punctuation ("period," "comma"). Ironically, only Microsoft Word is supported natively by VoiceType; if you use VoiceType to speak into IBM's own word processor, Lotus Word Pro 96, many words don't appear until you speak an entire phrase, a particularly confusing situation. You'll want to use IBM's VoicePad mini word processor for just about all your non-Microsoft Word applications.

    You can speak numbers continuously ("one two three point four five"). Spelling, however, is another story. VoiceType lets you enter a spelling mode using the Quick Dictation window, where you dictate the letters in the word. Unfortunately, this mode isn't available from within Microsoft Word. Otherwise, both it and Kurzweil Voice require you to spell out a word using a special dictionary of words representing letters. (Alpha, Bravo, Charlie, etc.). And VoiceType recognized common letter combinations, such as "IBM," as words.

    VoiceType's variety of entry modes and techniques is confusing, while Kurzweil Voice's approach is more straightforward. Kurzweil's controls remain the same in any application, and support for more than two dozen apps comes right in the box. Though VoiceType lets you speak more quickly and correct errors in previous words, it is inconsistent from application to application, which makes it harder to take true voice command of your system.

    Both programs support multiple users per machine, though each speaker requires extra disk space. Big Blue offers four additional VoiceType vocabulary sets for $499 each, for journalism, emergency medicine, radiology and legal.

    For my money, Kurzweil's consistency is king. That, and its easier control over system and applications, zmakes Kurzweil Voice the better choice.

    --Info File--
    Kurzweil Voice 2.0
    Price:
    $695; upgrade, $295
    Pros: Excellent recognition
    Cons: Less forgiving of soft or slurred speech
    Platforms: Windows 95, 3.1x
    Disk Space: 35MB
    RAM: 16MB for 30,000-word vocabulary; 24MB for 60,000 words
    Kurzweil Applied Intelligence
    800-380-1234, 617-893-5151
    WinMag Box Score: 4.5

    IBM VoiceType Dictation 3.0 for Windows 95
    Price:
    $699
    Pros: Good recognition; near-natural speech patterns
    Cons: Trigrams are distracting; less-capable system control
    Platforms: Windows 95
    Disk Space: 26MB
    RAM: 16MB
    IBM
    800-TALK-2-ME
    WinMag Box Score: 3.0

    Back to 9/96 Features: Surf's Up
    Up to Table of Contents
    Ahead to 9/96 Reviews What's Hot: NEC SuperScript 860 printer