Voice Dictation - Digital Systems

ELC Syllabus - ASNR 42nd Annual Meeting

By Hervey D. Segall, MD

Hervey D. Segall, MD has reported no financial interest, arrangement or affiliation with a commercial organization that may have a direct or indirect influence in the subject matter of this presentation.

Objectives

Contents

  1. Preliminary Remarks
  2. Commercial Voice Dictation Software Programs
  3. The capabilities of sophisticated Voice Dictation Software
  4. Speaking rate and accuracy
  5. Your Purchase and Getting Started
  6. Equipment-hardware considerations
  7. Creation of Speaker Profiles and Training your software program
  8. Recognition Modes
  9. Setting up for the Dictation Session ( Dictation Mode )
  10. Spelling Mode
  11. Command Mode
  12. Steps toward satisfactory dictation and creating word documents
  13. Problems
  14. Backing up and Exporting a Speaker Profile

 

Preliminary Remarks

Digital voice recognition systems have great potential for assisting the neuroradiologist in his/her work. They can be useful even now, if one understands the technological requirements and has the patience to install, optimize, and train a program. There is a relatively small time investment involved. However, a digital voice recognition system can pay off if you do not have constant access to an excellent secretary and if your typing skills or inclinations leave something to be desired. Furthermore, using a digital voice recognition system can be a nice change of pace in your composition of written information and it lends itself particularly well to tasks of certain types and magnitudes.

Voice recognition systems enable the conversion of spoken words into text. Speech is first digitized and then matched against a dictionary of coded waveforms. Matches are converted into text as if the words were typed on a keyboard.

There are simple speaker-independent voice-recognition systems that can recognize limited vocabularies, such as digits and a handful of words. Such systems have replaced human operators for telephone services such as collect and credit card calls. However, the more sophisticated systems do require that users enunciate samples into the system in order to tune it to their individual voices.

The three types of voice recognition applications are:

Continuous voice recognition systems are more advantageous then than discrete voice recognition systems which can also be used for dictation, but which require a pause between each word. Speaker-independent continuous systems that can handle large vocabularies are expected to become mainstream in the 2000s. There are also command systems that can recognize a few hundred words and eliminate the need for a mouse or keyboard to enter repetitive commands. These command systems are less taxing on the computer than the other voice recognition applications mentioned.


2 Commercial Voice Dictation Software Programs

2.1 Radiological Dictation Software
There are a number of software programs that are being sold on a professional level for radiologic dictation. I have been using ProgRIS for the better part of the past year. ProgRlS Voice Recognition is a speech recognition program that enables the user to create diagnostic text reports quickly and easily. Radiologists can dictate reports directly and instantly into text, correct and sign reports, and store them. The user speaks continuously and at a natural pace, with the text appearing on the screen as the user speaks. The system has a very high accuracy rate and accounts for differences in speech patterns. It enables the user to reduce the average time to complete a radiology report and gives the user greater control over the entire reporting process. The real-time responsiveness allows radiologists to edit and correct their own reports directly without the need for a transcriptionist. However, the current version of ProgRis I have been using is a very limited and inflexible program compared to some of the newer commercial programs described in the next section - it is sorely lacking in that it is not set up for ongoing training and in that it's vocabulary is very modest.

2.2 Commercial Dictation Software
Choosing amongst store-available voice recognition software suites can be difficult. Lernout & Hauspie Voice Xpress 5.0 competes with Dragon Systems' NaturallySpeaking, IBM's ViaVoice, and FreeSpeech 2000 from Philips Speech Processing. However, the choices are getting smaller. Just last year Lernout & Hauspie announced plans to acquire rival Dragon Systems. Lernout & Hauspie had expected to produce and support both companies' product lines initially, but the intent was to have a single product. However, on December 12, 2001 ScanSoft, Inc. announced that it had closed the acquisition of substantially all the operating and technology assets of the Speech and Language Technologies business of Lernout & Hauspie (L&H). Consideration for the transaction comprised $10 million in cash, a $3.5 million note, and 7.4 million shares of ScanSoft stock.

2.2.1 Voice Xpress
Lernout & Hauspie's (L&H) Voice Xpress Version 5, the newest version of this popular program, now sets up faster, is more accurately, and is easier to use than earlier versions. The company claims that you can be up and running in about 15 minutes and that's fairly accurate. The most time-consuming part of the process involves reading a 5-minute voice training script. Reviewers indicate that it definitely makes fewer mistakes than with previous versions. As with all speech programs, the more you use Voice Xpress, the more precisely it attunes itself to your voice and vocabulary. One must use it for several weeks to enjoy maximum accuracy. One useful new feature is a filter to automatically eliminate "ums" and "ahs" that clutter dictation. L&H has also done an excellent job of matching Voice Xpress to Office 2000. The so-called Sample Commands are tailored for your application and include a nice collection of typical voice commands useful for carrying out common tasks. It has a host of Web-centric features and broader support for office applications. It also has a new text-to-speech application called RealSpeak that, as its name suggests, reads back text. One reviewer found that performance was sluggish on a P II-333 with 128 MB of RAM. On a P III- 600 system with the same amount of main memory, however, the reviewer enjoyed virtually instantaneous recognition. Lernout & Hauspie's web address is http://www.lhsl.com/.

2.2.2 NaturallySpeaking
Lernout & Hauspie's Dragon NaturallySpeaking was the leading consumer speech recognition software program during its first year, with 40 percent of all unit sales. As L&H's second consumer speech recognition product, L&H debuted NaturallySpeaking 5.0 not long ago and version 6.0 is now already out.

The NaturallySpeaking 5.0 program features a new user interface, a host of usability improvements, and support for Intel's upcoming Pentium 4 microprocessor. There are many versions available including L&H Dragon NaturallySpeaking® Preferred, Preferred USB, Standard, Essentials, Professional Solutions, Legal Solutions, Medical Solutions, and Public Safety Solutions. Software can be purchased for multiple languages. These new version of L&H Dragon NaturallySpeaking offer increased accuracy and enhancements such as improved support for e-mail and Internet browsing. These features and others, based on extensive customer feedback, make L&H Dragon NaturallySpeaking version 5 more intuitive and even easier to use. Popular programs can all be controlled with your voice. New commands such as "Check for New Mail," "Send Email," and "Reply to message" are supported. Many of the new features in version 5 are designed to help computer users discover the productivity benefits of using speech with their Microsoft® Windows® based applications. For example, the new "Quick Correct" feature speeds proofreading and correction. A new Quick Start guide provides step-by-step instructions, quickly moving users from installation to their first dictation. Finally, the "Add Words from Document" feature improves accuracy by quickly adding new words to the L&H Dragon NaturallySpeaking vocabulary from existing documents. This feature improves accuracy and customizes Dragon NaturallySpeaking version 5 to each individual user. L&H Dragon NaturallySpeaking version 5 products claim to require as little as five minutes of voice training.

Dragon NaturallySpeaking® 6 brings together the best of the L&H and Dragon NaturallySpeaking® text-to-speech and speech recognition technology. In addition to assisting users to operate a computer hands-free, the software contains many new features and general improvements. Dragon NaturallySpeaking® 6 claims to be more accurate than ever bringing improved out-of-the-box speech recognition performance-to a product already known for its high recognition accuracy-with new features, including Nothing But Speech (NBS)™, a tool that filters out fillers and sounds between dictation-such as "uhms" and "ahs"-to avoid insertion of unwanted words. Version 6 also features new modes for spelling, numbers, commands only, and dictation only, which increase accuracy when doing specific tasks by voice.

Version 6 contains several new accuracy aids, including the "Acoustic Optimizer", a powerful tool that lets Dragon NaturallySpeaking® software process and learn from all the data it has collected over time. Although the software learns every time you correct errors, running the Acoustic Optimizer will enable it to compile and analyze your data in one sweep, a much more effective process than gradual adaptation from each correction. Just set the Acoustic Optimizer to run while you are away at a meeting and you will return to improved recognition. It also has a tool that lets you add names from your Lotus Notes® or Microsoft® Outlook® address book to your vocabulary and scans sent e-mail so that Dragon NaturallySpeaking® 6 software can learn the words that you like to use and the way that you use them.

Version 6 allows you to easily edit by voice in more places. Select-and-Say™ functionality, which enables you to make text changes easily by voice, is now available in Open and Save dialogs in many applications, the "Find" dialog, and many more places. Version 6 makes corrections by voice easier when you proofread and correct your work with the improved correction menu, which is now available wherever you can select text by voice. You can also learn from your corrections. You can now select and correct a misrecognized word using voice, mouse, or keyboard-without going into a correction window-and Dragon NaturallySpeaking® 6 software can still learn from your correction. The new correction workflow allows experienced users who prefer dictating an entire document before proofreading to quickly move through the document phrase by phrase and make corrections as needed. This feature is also helpful for transcriptionists.

Using version 6 you can find commands more easily than before and create your own commands quickly and easily. A variety of command creation tools are now available to help you increase your productivity including creating text blocks-including graphics such as bitmaps-and insert them into documents or e-mails-all with a single voice command. For example, you can design a standard letter closing, including your signature-and add it to all your correspondence. It also features a Macro Recorder. If you repeat the same action frequently, you can record your mouse movements, clicks, and keystrokes; assign a name to the recorded sequence; and recall the action with the command any time. A Microsoft® VBA-compatible scripting tool lets you build complicated macros and even voice-enable applications that you may want to use. This is the full-fledged customization tool for programmers. (Dragon NaturallySpeaking® scripting language commands from previous versions of the software are also supported.)

Dragon NaturallySpeaking® 6 lets you listen to text in the most human-sounding voice available. Dragon NaturallySpeaking® software now includes the award-winning L&H™ RealSpeak™ text-to-speech software. Listen to your computer read e-mails, memos, and other text in a human-sounding voice. You can save your dictation for outside transcription: Dictate into Dragon NaturallySpeaking® software, save a recording of your dictation with your document, and then send your file elsewhere to be corrected. This improved third-party correction feature is now available in Microsoft® Word and Corel® WordPerfect® word processors, as well as the DragonPad. You can export and import your user files with the click of a button. Store your user files in any network directory or on removable media, such as a USB memory device, CD/RW disk, or Zip disk. Users can easily back up user files for safekeeping or to transfer them to another computer without copying files manually. This feature is especially helpful for users who wish to dictate at multiple workstations.

Finally, users who wish to create a limited, special-purpose vocabulary can now build a custom dictation vocabulary from scratch, adding only the words that they need for their work.

2.2.3 ViaVoice
IBM's ViaVoice 8.0 works seamlessly with Microsoft applications and is significantly more accurate, according to IBM representatives. It is being shipped in four editions. The high-end $200 Pro Edition includes a Universal Serial Bus headset with optional analog jacks for PC sound cards. On the low end is the $30 Personal edition, which ships with a standard headset. IBM has added support for a Universal Serial Bus microphone, which is particularly good for owners of budget PCs and notebook computers with substandard sound cards, representatives say. IBM claims its USB microphone enhances audio quality and thus reduces error rates by as much as 30 percent. IBM is also focusing attention on allowing people to control their PCs using voice commands. For example, you can use the ViaVoice Document feature to dictate a fax message. At your instruction, ViaVoice launches a Microsoft Word for Windows fax template that you complete. By using System Navigation macros, you can create personal voice commands to navigate Windows. IBM says it has cut in half the time it takes to get started using ViaVoice for Windows. To improve accuracy, a new feature called ViaVoice Marks uses text-to-speech technology to let you hear playbacks of audio commands before or after they are executed. The Via Voice requirement for 460 MB of free disk space will be a problem for users who have relatively limited hard drive capacity.

2.2.4 Macintosh Voice Recognition
IBM has ventured onto Mac turf, demonstrating its ViaVoice voice-recognition program for the Macintosh now building on its experience using the program on Windows platforms. MacSpeech expects to ship iDictate, based on the Philips FreeSpeech 2000 engine mid-year. The next-level product using Philips' technology is iListen, which supports editing, formatting, and simple speech navigation as well as dictation. You will be able to dictate text, edit it, and format it by voice, say MacSpeech officials. The software will let you use your voice to create a text macro.

3 The capabilities of sophisticated Voice Dictation Software

When you use modern voice recognition products to dictate, the words you say appear automatically on the screen and with the proper spacing. They also enable dictation of punctuation, numbers, dates, times, currency, etc. in an intuitive way automatically adding the correct formatting for you.

Command recognition is also applied to the preparation of word documents. Simple software commands can be added to format your document - you can use simple commands to underline or italicize phrases. When using parentheses you can prefix the parenthesis command on L&H with the words "right" and "left", and you can insert quotation marks with relative ease. It also seems to do well inserting numerals for dictated numbers. Voice commands will also enable you to navigate up and down your document.

Some commands work better than others do. The use of the command mode for capitalizing words with L&H Voice Xpress version 5 has been quite erratic in my experience. Nevertheless, you can easily highlight a title in capitals in totality by using the suffix command "uppercase that".

Furthermore, these more recent software programs, as indicated above, may let you control the Microsoft Windows operating system, including all other Windows based applications on your desktop, with the sound of your voice. Most of the navigation, formatting and editing commands are global that is, you can say them in virtually all applications. L&H and Via Voice can be installed for use with Windows 95 through Windows Millennium.

A feature of Via Voice is the ability to insert macros into your dictated documents. Macros are shortcut commands that you may create for inserting text directly into the documents. A dictation macro is a command associated with the blocks of text that you define. When you say that command during a dictation, the associated block of text is inserted into your dictation. If you use the same text in many dictations, a dictation macro for that text makes it easier to place the text into your dictation.

You can also surf the web, go to your Favorites, and use other web-browsing features by voice. Using L&H you can navigate Internet Explorer with your voice provided that you have a connection to the Internet and a relatively recent version of Internet Explorer. Commands are available that will launch Internet Explorer, load your home page and commands that will take you to specified pages on your Favorites menu.

Newer software can also enable you to read back text you have dictated. On L&H this is the "talking text" feature. However, L&H's RealSpeak is available only in the advanced and professional editions.

 

4 Speaking rate and accuracy

It has been said that if you have the right hardware, prepare your software program properly, and speak clearly, then you should be able to dictate at over 120 words per minute with between 95% and 98% (or better) accuracy using modern voice dictation programs. The manufacturer of L & H claims an overall recognition accuracy level of about 96 percent. Although accuracy might approach this level under optimum circumstances, and when dictating a normal or relatively run-of-the-mill report on a professional medical dictation system, it is my gestalt, however, that it does not perform at anywhere near that level when doing a more complicated dictation. The ProgRis system claims a "very high accuracy rate", while refraining from using percentages. The literature that accompanies my L&H Voice Xpress, version 5.0 software claims that you can speak at a rate of up to 160 words per minute, dictating in a natural relaxed manner. I have not actually timed my dictation speed but the words usually do spill forth quickly on the computer screen. However, I doubt that you can dictate a long report with the speed of a professional typist listening to a good taping and using current technology.

You may adjust sensitivity settings for recognition to obtain better accuracy depending on the speed of your computer and the background noise in your location. In ViaVoice under "ViaVoice options" you must select the voice tab and then move the slider in the direction of "exact match" if you want ViaVoice to be more critical in matching sounds with words. In L&H on the "General" tab (under "Properties") move the slider to "More Accurate" to accomplish the same thing.

 

5 Your Purchase and Getting Started

Current voice dictation software purchased for personal use comes in a package containing a CD-ROM (as is the case with many, many other application programs). A microphone (usually headset type) is usually enclosed, and written booklets and other instructional materials that help you get started. Pricier versions are also available for portable dictation. These contain hand-held voice recorders. Software programs also usually contain a set of interactive lessons (installed from the CD-ROM) that are designed to help you get started dictating quickly. L&H's "Voice Xpress Café" is an example.

 

6 Equipment-hardware considerations

Review the computer system requirements and other information for a voice dictation system software program when you buy it. Performance with voice dictation software will vary - its success is dependent on the quality of the computer it is installed on. Particularly important are your computer's sound card, its processor speed, and its memory. I have seen it written that some voice dictation programs will actually "work fine on a lower end computer" but I find this very difficult to believe.

 

6.1 Sound Cards
For voice dictation, a good sound card is the most important part of your computer system. When you buy a sound card, mention that you want to use it also for speech recognition. Voice recognition and dictation programs will work better and the sound will be clearer with a sound card that has a high signal-to-noise ratio. Most sound cards have a signal-to-noise ratio between 85 and 90 decibels (dB). Some go to 95dB or higher.
For speech recognition in general, the important part of your sound card is the input channel, the microphone input, amplifiers and A/D converters. This is not what the sound card companies spend money on, however. To most sound card companies, a good sound card means that the sound output of your favorite computer games or music-composing system sounds very good. Just because a sound card produces good output does not mean that the sound card produces good input.

Again, review the system requirements and other information for a software program when you buy it. They may give recommendations or approval for certain sound cards.

The recommendation accompanying L&H voice dictation systems software is for Creative Labs SoundBlaster 16-bit compatible or sound card that supports 16-bit 22 kHz recording.

Cards that have been approved by Dragon voice dictation systems include:

  1. Turtle Beach MultiSound Fiji Pro Series has been rated by Dragon personnel as the best card for their speech recognition systems. It is very quiet and has a good clean signal.
  2. The Turtle Beach Tropez Plus is also a good choice but it may not be widely available anymore.
  3. The original SoundBlaster 16 PnP manufactured by Creative Labs was a great card for speech recognition. It was quiet and inexpensive and they still may be making them. There are clones but many may fall short for voice dictation.
  4. HiVal SounTastic 16 PnP, a SoundBlaster clone is relatively inexpensive but it has worked quite well with voice dictation software.
  5. SoundBlaster AWE 32, AWE 64 or AWE 64 Gold. These are the high-end cards from Creative Labs. They are more expensive than the SoundBlaster 16 PnP and they are a little quieter but they may not be worth the extra money. In addition, there seem to be a number of different versions of these cards and not all variants may give you the same performance.

Note: Sound card makers have a habit of changing the design of their cards without advertising the fact. It has happened that a sound card manufacturer has changed the design of their card and ruined it for speech recognition.

Having an inferior sound card is one possible reason that a voice recognition dictation system will run slowly for you, but it is not the only possible reason. There may be a system problem, like missing a decent processor cache, or having some other application running in the background, which causes interference with and may therefore cause speed degradation.

Listen to the quality of the sound that the voice recognition dictation system has actually recorded. It will save a file in the directory for the program. You can open this file using the standard window sound recorder program. Listen to this recording, make sure that it sounds clear, and that is free from static and distortions.

Some computers have poor quality built-in sound systems. A separate sound card makes for better voice recognition as compared to computers with sound built-in to the motherboards. The experience of some experts is that these built-in sound systems are not quite as good as separate sound cards probably because built-in sound systems experience more electrical noise that affects the quality of the input.

It has been written that the built-in sound in the Dell Dimension desktop computers leaves much to be desired. On such a machine, you may experience the classic symptoms of a bad sound card: higher than usual error rates and, most noticeably, significant speed degradation. Some versions of the Dell Dimension computer seemed to have some type of problem in its built-in sound system that interfered with speech recognition. If you have a Dell, you may need to purchase a separate sound card for better voice recognition. However I have learned that at least some (probably all) of the state-of-the-art Dells that have been recently marketed do indeed have separate sound cards.

The most obvious symptom of a bad sound card, meaning one which does not have sufficient quality to run a voice recognition dictation system, is that the voice recognition dictation system will run slower than expected. Actually, it will also make more errors, but this is harder to measure and a bad sound card tends to affect the speed first and the accuracy second.

With very few exceptions, a desktop system with a separate sound card will give you better results than a laptop computer. Laptops are noisier (electrical noise) than desktops and, as implied above; speech recognition is very sensitive to noise. However, some laptops work adequately. Two models from Micron, called the XKE and VLX, are reported to have a good sound system for speech recognition.

Directions for sound card settings are provided in the voice dictation program package inserts. Note that the instructions differ depending on the Windows operating system that you are using.

6.2 Processor, processor speed and memory (RAM)
Via Voice is optimized for Pentium II, III, and AMD-K6 with 3DNow processors. The minimum processor requirement for L&H Voice Xpress version 5.0 is an Intel Pentium I processor but a Pentium Il processor at 266 MHz with MMX technology is recommended for more optimal performance. Extra instruction sets on Intel's Pentium III chip are built into the CPU to make it run faster - of value for voice dictation software. L&H Voice Xpress version 5.0 is optimized for a variety of popular processors including Intel Pentium I, Pentium with MMX technology, Pentium III, Intel Celeron and AMD-K6-2 and AMD-K6-3 with 3Dnow, Athlon, and Cyrix. Dragon software is actually optimized for the Pentium III chip. The newest Dragon software technology is made specifically for this processor and the AMD Athlon chip.

L2 cache (generally referring to memory cache that is external to the CPU chip) seems to be critical to good performance. Most systems have an L2 cache but some low-end systems do not. Check before you buy. Memory and disk caches are in every computer to speed up instruction execution and data retrieval. These temporary caches serve as staging areas, and their contents can be changed in seconds or milliseconds. A level 1 (L1) cache is a memory bank built into the CPU chip. A level 2 cache (L2) is a secondary staging area that feeds the L1 cache. Increasing the size of the L2 cache may speed up some applications, but have no effect on others. An L2 cache may be built into the CPU chip, reside on a separate chip in a multichip package module, or be a separate bank of chips. Caches are typically static RAM (SRAM).

In viewing the system requirements for voice dictation software programs, and based on my own personal experience, I recommend that you use a computer with no less than 128 MB of SDRAM.
I have two computers running on Intel Pentium ll MMX processors and Windows 98. I had some difficulties with a voice dictation program on one of them at 64 MB SDRAM (processor 333 MHz). I then tried it on my other computer that had 128 MB SDRAM (processor 350 MHz). The improvement in the performance was remarkable.
With less-than-adequate memory you may be able to dictate in real-time but dictation and correction may be slow and there will be a start-up delay when you switch applications.

If you have only 64 SDRAM memory an additional 64 SDRAM can be easily installed. Installing memory is one of the easiest things you can do. You do not need to tinker with the BIOS. You do need to know the correct memory modules to install, which may take more than a little research if you haven't been down that road before. However, this information should come with your motherboard specifications. You should also be familiar with the configuration of your motherboard. You need to know the number of DIMMs and the number of slots that are available.

I noted, two years ago, in looking at newspapers in the Los Angeles area, that memory was selling at roughly a dollar per MB (i.e., around $125.00 for 128 MB of SDRAM). Prices for high quality memory have dropped considerably since then. For example, in November 2001 a 128 MB PC800 Rambus RIMM memory module could be purchased at Fry's electronics in Los Angeles for $40 while 256 MB of PC2100 DDR (double data rate) memory could be purchased for $65 in February 2002 at the same store.

6.3 Required hard drive space
Modern hard drive capacity is required to accommodate current voice dictation programs. For example, L&H requires 250 MB free hard drive space while Via Voice's requirement is even greater (460 MB).

6.4 Microphones
Many different microphone styles are available. You can dictate using a hand-held microphone, you may speak into one perched on a stand in front of you, or one mounted on your monitor or into a microphone array spread along your monitor's top. Microphone headsets come packaged with IBM's Via Voice and L&H voice dictation systems. The microphone headset that comes with L&H Voice Xpress version 5.0 has "noise canceling features".

On ViaVoice the audio set up wizard configures your sound system. This audio set up wizard shows you how to plug your microphone into the sound card or USB adapter and port correctly. Once this has been done, you must also set up and tune your microphone so that the software program can recognize your voice against background noise.

If your microphone doesn't seem to be working there are likely a couple of causes. First, make sure you plugged the microphone into the right jack on your computer. Most computer sound cards have a jack just for the microphone. In addition, check to make sure you didn't set the microphone volume to mute. You can check this by double-clicking the yellow speaker in the system tray. Make sure the Master Volume isn't on mute. If you don't see a volume control for the Microphone, click Options in the Menu bar, then Properties. Find the section labeled Show the Following Volume Controls. Scroll through the list until you see the entry for microphone, click the box next to it, and click the OK button. The microphone control should now appear.

7 Creation of Speaker Profiles and Training your software program

Modern voice dictation products come with a large built-in vocabulary and they can learn new words as you use them. Via Voice starts with a vocabulary of 160,000 words. L & H Voice Xpress, version 5.0 has a 307,000-word vocabulary.

When you start to use a voice dictation software program for the first time as a new user, you must create a speaker profile that identifies you to the system. It also ensures improved recognition accuracy by storing information about your speech. This involves two kinds of information, namely:

  1. Language - what you say, and
  2. Acoustic - how you sound.

The enrollment process in the ProgRis system is completed through the Speech Enrollment module. Enrollment trains the system to respond to your voice and speech patterns with greater accuracy. They say that not everyone needs to go through the enrollment process but that you should enroll if you think your speech patterns might be difficult for the system to interpret or if you find that the system does not recognize your words correctly. The ProgRis enrollment is a two-pan process: You first record your speech patterns and secondly, you train the system to analyze your recordings. During the recording phase, the program will display a minimum of 50 phrases to read for you to give the Enrollment program samples of your speech patterns. Enrollment notifies you when you have recorded enough sentences to begin training. This can be completed in approximately 10-20 minutes. 150 sentences remain for additional recording.

The enrollment environment should be similar in level and type of noise to the one in which you will be dictating. You do not turn off your computer or shut down Windows during the training session. Your computer must be powered on and you must not use it while your Enrollment is being processed. Once you are enrolled, you can log on to the application. You can do another recording session later to add more samples of your speech.
In the case of store-available voice dictation software, training a program is most definitely not a one-time affair. The most popular systems rely on various methods to implement what is in effect continuous training. For example, Lernout & Hauspie's Voice Xpress package recommends that the system be trained not once, but three to five times by each user. The effectiveness of the package increases with each training session. In addition, all of the popular packages allow various methods of on-the-fly training. If you are dictating, and come across a word or phrase that the system does not recognize, you can stop for a moment and either train the system to recognize your particular pronunciation of the word or, if the word is simply not in the system's vocabulary, add that word to the dictionary. Finally, most packages let the user import typical documents (letters, memos, etc.) into some sort of vocabulary extender; the program will then parse through the document(s), adding words to its vocabulary and letting the user teach the system how certain new words are pronounced. L&Hs "accuracy builder", Dragon's "vocabulary builder" or Via Voice's counterpart are component programs that enhance your recognition by analyzing the content of your existing documents and gathering important information about the language you commonly use in your writing.

You can install multiple voice recognition products on a computer. Each foreign language or professional product, such as L&H Voice Xpress medical solutions, comes with its own unique vocabulary. Each user must create a separate speaker profile for each vocabulary & enroll with each one. If you have created multiple speaker profiles, you must select the appropriate speaker profile for the vocabulary that you want to use during a particular session.

More than one person may create a speaker profile on a given computer software program. Each user must select his or her own speaker profile when running L&H Voice Xpress or some of the other software programs. "Sharing" your user name with others will corrupt your personal speech files and cause recognition problems. Other people using your speech system must use their own user names.

As mentioned, you should always use your own speaker profile whenever you dictate. You can export your speaker profile for use on another laptop or desktop computer. To use a speaker profile on a different computer you must export from your current computer and import it to the other computer (see below). Then you must run that through microphone tuning with the imported speaker profile.

Invoking certain dialogs in a dictation program will allow the user to correct misrecognitions. For example, in L&H, a single word misrecognition may be corrected by highlighting the incorrect word and saying "correct that". Noting the correct word on the "correct for accuracy" dialog box will rectify the matter. Via Voice's correction window provides a similar function in the IBM product.

Multiple word misrecognitions can be corrected in L&H using the "add and train" dialog box onto which you type the desired words or phrase and then subsequently follow the on-screen prompts to enunciate them.
Voice dictation programs tend to favor long phrases over short words, so by adding a phrase to the vocabulary, it will be much more likely to be recognized even if you don't pronounce it completely accurately. Some people have added hundreds of phrases to their vocabulary and claim that this improves their accuracy significantly.
If you are having trouble with repeated errors, then you can use the L & H's accuracy builder, Dragon's Vocabulary Builder or Via Voice's counterpart to help. To do this, dictate or type some paragraphs that include samples of the words or phrases that you are having trouble recognizing. Then duplicate those paragraphs two or three times in a document and include that document next time you run the accuracy builder, Vocabulary Builder, etc.

8 Recognition Modes

Voice recognition programs may employ several different modes depending on what the user wishes to execute. In L & H Voice Xpress version 5 the default mode is the "dictation and command mode". It is also known as "normal mode". You may also say "switch to Dictation mode", "switch to Command mode" or "switch to Spelling mode" if you wish to dictate, command, or spell exclusively.

9 Setting up for the Dictation Session (Dictation Mode)

The start up procedure is different for each software program. For example, with L&H Voice Xpress the selection of your speaker profile, selection of correct input device, running microphone tuning for best recognition, and a number of other actions are carried out as follows:

  1. Activate your voice recognition program - in general this is done as you would open any other program
  2. Select the correct speaker profile for the dictation you plan to do. You should always use your own speaker profile whenever you dictate.
  3. Select the correct input device and make sure it is properly positioned. Microphone positioning is key to getting consistent performance. In the headset configuration, the microphone element should be positioned by the corner of your mouth - about 1/2 inch from your mouth. Make sure that the microphone does not move during a dictation session.
  4. Turn on Microphone - this may involve using both the microphone's on/off switch (if it has one) and activating it on the software program's toolbar. ("microphone on" is selected from the L&H toolbar.) Your speaker profile has been loaded then preparation of your microphone takes place With L&H Voice Xpress 5.0. This having been done, you must also set up and tune your microphone so that the software program can recognize your voice against background noise.
  5. Run microphone tuning - this involves double steps of measuring background noise and then comparing your voice to the background noise. Each time you dictate do mic tuning for the current acoustic environment - this takes only a few seconds but it can make a tremendous difference in recognition accuracy.
  6. Microphone tuning is part of the startup process each time you commence a session with With L&H Voice Xpress 5.0, whether or not you actually want it.
  7. To create documents by voice, select the word application program you want to dictate to (e.g. Microsoft Word, Wordpad, WordPerfect, L&H XpressPad, Via Voice SpeakPad, etc.)
    Though not ordinarily listed as part of the instructions for voice dictation I recommend backing up your document frequently. Although I have not found anything in the voice dictation literature discussing it, I think that you should know how to protect against inadvertent deletion of the content of your document (especially one that is lengthy and would take considerable time to redictate) while you are in the process of using your voice dictation system.

    An occasional problem that has occurred when I have used voice recognition software (for some inexplicable reason) is total deletion of the entire document. This is not a major problem when you are just starting on the document but it can be very irritating when you have half an hour's dictation on the line. Thus, I take special precautions when I am about to prepare a long document. Some options can be selected from the Word toolbars that can be used to create backups that would be helpful in such a circumstance. I select "tools" and, from the drop down menu for "options", there are three things that I check under the "save" tab. These three things are "always create a backup copy", "allow background saves", and "save auto recovery info every:" (and alongside that, I select "1 minutes").

    When in the process of actually dictating a long document in Word, I go to "File" and from the drop down menu that appears I select "versions."

    In the dialog box that appears after clicking "versions" I check "automatically save a version on close "and I click on "save now "for the most absolute assurance of a save, and then I close. Therefore, should the most recent version be inadvertently deleted, going back then to the recent previous version will rescue much of what has been dictated.

    Also, be careful about creating a phrase that sounds like "delete that" - that might be misinterpreted as a command at the wrong time!!
  8. Close down other programs - always close down all other programs when you want to use your voice dictation software. Your voice dictation program needs to have all of the system resources that are available.
  9. Use ScanDisk and Disk Defragmenter frequently and especially if things get sluggish

    ScanDisk is a worthwhile program on your computer's Windows operating system that checks the physical surface of the hard drive and the way that each bit of data is stored to ensure that everything is OK. Running ScanDisk should be part of your regular computer maintenance.

Your hard disk should also be defragmented periodically to put your files back into order using Disk Defragmenter - another system tool also found under "Accessories" on your computer's Windows operating system. On a brand new computer, the hard disk is relatively empty so new data is written to the hard disk in one contiguous block. The computer can quickly access the data because it is all in one place. After a while, the computer is no longer able to save information in large blocks but rather keeps them in many separate little empty nooks and crannies of your hard disk. Thus, a file may be broken up, or fragmented, into little pieces and stored in many different areas of the hard disk. The computer ingeniously keeps track of the addresses of each piece of data and puts it all together when it is needed. Yet, obviously, the more broken up the information is, the longer it takes to access the data and the slower the computer operates. Fortunately, your Windows computer comes with a simple-to-use program that will defragment your hard disk. This process reorganizes the disk by putting files into continuous order and gathering all the free space on the hard disk into one block making data retrieval faster and easier for the computer.

To use ProgRiS Voice Recognition, you must first log on to the system. Prior to logging on, you should be set up as a user through the Data Editor. Your System Administrator sets up you and your radiologist colleagues as users in the system. The working day dictation session proceeds as follows:

  1. To start ProgRis select "Programs" then "ProgRis" then select the "Report Dictation-Approval Mode".
  2. Log onto the workstation by typing your user ID and password, and pressing the Enter key.
  3. On the ProgRis top toolbar select "Report" and click "open".
  4. The ProgRis-Exam selection dialog box appears. Barcode patient's requisition - click "OK"
  5. You may paste onto the clear screen a previously compiled template at this point (especially useful in the case of a normal examination)
  6. Be sure the microphone is on. Say "start dictation" or "begin dictation" into the microphone, or click the Start/Stop Dictation icon. The color of the report screen changes to turquoise.
  7. Begin dictating your report. When you begin speaking, the text displays.
  8. When you have finished entering your report or as much of it as you wish to record at the time, say "stop or end dictation" or click the foremost left icon. The background on the screen returns to white.
  9. At this time, before accepting and signing your report, you can make changes to the report text by using the mouse and keyboard as you would with any other document on your computer.
  10. You can click Start/Stop Playback or say "Playback" to listen to what you have just dictated. Compare the playback to the text in the transcription area.
  11. When you have ended your dictation, you can make it final (see "Approve", below) or select the following option: Save the report with the "Save as Preliminary" voice command (it is recommended that you do this before doing anything else or whenever you have dictated a substantial amount of text). The "Save as Preliminary" function may be used when a radiologist wants a transcriptionist to review the report he just dictated before he reviews and approves it. This function reduces the overall benefit in terms of required personnel and increased report turnaround times.
  12. When I have completed the report, I select "Radiologist" from the ProgRIS top toolbar and then, from the dropdown menu, I select "Approve", which effectively accepts and signs the report. Since this function makes a report into a medical record that cannot be changed, you should always carefully proofread a report before approving it. You can also accept and sign the report with the "Save" voice command. If you find later that you have anything to add, you can add an addendum report.

10 Spelling Mode

When using natural spelling the best performance is achieved by spelling continuously and rapidly without pausing in between letters.

In L&H facilitate the spelling you intend to do by saying "switch to spelling mode" (say "begin spell" in Via Voice). Alternatively, you can select "spelling mode" from the recognition mode menu on the L&H toolbar. This will get you out of dictation and command mode, the "normal" mode. In ViaVoice spelling mode may also be used to dictate a sequence of digits.

If you have problems spelling in the usual way then you can also try spelling using the military alphabet. However, you must pause when switching between the natural alphabet and military alphabet. The words used to dictate letters and military alphabet are as follows: Alpha, Bravo, Charlie, Delta, Echo, Foxtrot, Golf, Hotel, India, Juliette, Kilo, Lima, Mike, November, Oscar, Papa Romeo, Quebec, Sierra, Tango, Uniform, Victor, Whiskey, X-ray, Yankee, Zulu.

Remember, you can speak military alphabet continuously or the natural alphabet continuously in the correction mode, but if you if you mix the two you must pause between them.

 

11 Command Mode

As mentioned above, the "dictation and command mode" (also known as "normal mode") is the default mode in L & H Voice Xpress version 5. If you wish to execute commands exclusively, you can say "switch to Command mode".

Be sure to pause before and after saying a command to distinguish it from text - but don't pause in the middle of the command. Via Voice processes your words as dictation until you pause - then it starts to listen for a command.

Some examples of the commands that I like to use in L&H are:

In general, you will find that some commands work better than others do. As you become more experienced with your system, you will find what works best for you. When using commands your program can occasionally become refractory to a particular term used for a particular command. Therefore, it is good idea to have other options. For example I find that when "new paragraph " will not work I might try "new line" and that will almost always be successful in starting up a new paragraph. Should the system become refractory to voice commands sometimes it is more expeditious to perform some of these commands in the conventional way using the keyboard.

 

12 Steps toward satisfactory dictation and creating word documents

Dictate in continuous even phrases without pausing between words. Do not say one word at a time in a "robotic" speaking speaking style. Talk at a normal pace, be articulate and say each word clearly (like a broadcast announcer) to give your speech the best chance of being understood.

Verbalize every word. Software programs cannot read your mind. If you want the software program to recognize a word you must say the word. In conversations between people, you may sometimes be able to drop words or slur words together and yet be understood. If you drop a word, do not expect a voice recognition program to pick it up or fill it in. Software programs do not understand shortened pronunciation in small words such as "and" where you might just use "n" (for example, when saying "Head 'n' Neck" instead of "Head and Neck").

Talk to the software program the way you did when you trained it. The software program learned the way you speak from those samples. If you speak as if you're reading something, then your accuracy may improve. You can re-run the training session in an attempt to retrain the software as to the way you dictate.

Composing while dictating is a new skill for many people. The trick is to try to think about what you're going to say before you say it. Then, when you speak, your dictation will be clear because you already know what you are going to say. If you talk without thinking, you'll tend more to mumble and you will find your words become more garbled and less distinct when you change your mind as you speak. Be sure to speak every word and try not to stutter or say "um".

Some individuals find it helpful to organize their ideas and dictate an outline first - then dictate the content. Dictating from handwritten notes may facilitate the whole dictation process.

Additional points:

  1. Pause whenever you need to collect your thoughts.
  2. It may be best to turn off your microphone (do this on L&H by saying "stop listening") when you pause and contemplate subsequent paragraphs. Turn it off when you are waiting for temporary extraneous noise to clear. This noise can add unwanted elements to a document that you are working on. Your microphone when on will pick up any breath noise and conversations. Turn off microphone when not in use (you can use the voice command "stop listening" on L&H)
  3. Keep your eye on the monitor screen as you dictate. You want to be sure that your dictation is being heard by the system, and that jargon is not being produced. Again, since the possibility exists that a dictation program may misinterpret what it hears as a command (and since the position of the cursor on the monitor screen may shift unexpectedly) it is always to keep and eye on the screen during voice dictation

Once you are comfortable with basic dictation, there are a few other things to try:

  1. Learn how to correct misrecognition and invoke training tricks in ways that will enhance future recognition. See also tips you can use in the section on "Creation of Speaker Profiles and Training your software program" (part in ViaVoice and L & H you may get some unexpected results when you say a word that the program interprets as a command. This should be corrected as a recognition error. Via Voice processes your words as dictation until you pause - then it starts to listen for a command.
  2. Be sure to pause before and after saying a command to distinguish it from text - but don't pause in the middle of the command.
  3. Try basic navigating by voice.
  4. Use any combination of mouse, keyboard, and/or commands that enable you to do whatever is easiest for you. Use a combination of mouse, keyboard and/commands to add your text, to navigate or to edit. Generally speaking, you can edit your text as you dictate, or you can dictate first and go back over the text later and make changes. During voice dictation, when a number of words or phrases do not come out as I would have wished, I will sometimes redictate them more clearly and then later on (when I edit what I have dictated) I delete the undesirable elements.

 

13 Problems

Digital voice dictation is far from perfect at the present time; many errors and problems may arise during a dictation session, including (amongst other things) the following:

Performance with voice dictation software will vary based on the quality of your sound card, the amount of memory available and the speed of your processor, amongst other technical factors. The quality of your own dictation is important. Hesitant dictation, slurred words, and meaningless noises will confound your efforts.

14 Backing up and Exporting a Speaker Profile

You can export your speaker profile for use on another laptop or desktop computer. To use a speaker profile on a different computer you must export from your current computer and import it to the other computer. Then you must run that microphone tuning with the imported speaker profile.

For example, to export a speaker profile in L&H, select "Properties" and then select the start up tab. On the startup tab, select the speaker profile you want to export and click "export". On the export speaker profile dialog, specify the drive letter and folder where you want to put the exported profile and click "save". L&H Voice Xpress exports your speaker profile to a file with a .vxs extension. To use the exported profile, you first import the profile from that computer into L & H Voice Xpress on the other computer.

To import a speaker profile in L&H, click on the import button on the Select a Speaker profile dialogue. Otherwise, click "properties", then click "start up tab", and then click "import". Select the speaker profile (.vxs file) to import and click "open". Voice Xpress adds the imported speaker profile to the list of existing profiles on that computer (or updates the profile if it is already on the list). Run microphone tuning on the imported speaker profile.

 


Click here to go back to the Main Menu