
I can appreciate the utility..
But in my friend trying to teach me a little of *correctly inflected* Russian (and I have a good ear for, voice for 'language sounds', generally) - you'd (maybe) not believe how Many ways you can (try to) say:
Tu\ufffd\ufffd grosnya kapitalistichiskaya sviny\ufffd!
You filthy capitalist swine!
I know that the scope display of just one of these words (storage scope that is) would show the nasal, throat qualities as lo-freq. waveforms, with the sibillants as hi-freq modulation at start.. but -
Now add-in the mumblers, the couth-less, the local acc\ufffdnts.. Perhaps Ben's suggestions below - can tell you how near we might be. I'd opine that: if voice-recog. is up to querying a truly International glossary of sounds VS a valid sample of a message of more than a few words: your accuracy would reflect the state of the (Office) art.
I do recall that (couple years ago I think) there were some algorithms better suited for one-shot guesstimates / others for (the 'training' approach). The latter produced much higher overall accuracy (99% for deliberate slow speech?).
You can bet the Feds have had a chat with IBM and Kurtzweill (?) already. Can also bet - further improvement will Not come from a Billy, "writing neat tight C+ code"* (the pompous, arrogant snivelling Lying bastard). You can't code without a productive algorithm (right?)
* yeah the little prick actually Said that was "his hobby!" - got a link somewhere.
Luck,
A.