Post #17,646
11/10/01 12:25:01 AM
|

OT you have experience with waveforms right?
trying to come up with an idea. Can language be determined by an examination of a recording of a voice conversation by using an examination of the sound wave. Instead of hundreds of linguists a single solitary chip could identify the language being spoken. thanx, bill
tshirt front "born to die before I get old" thshirt back "fscked another one didnja?"
|
Post #17,652
11/10/01 12:41:46 AM
|

Don't know enough.
Spectrum analyzer is one view; FFT (Fourier) another. In real time - some phonemes are discernible by those who stare at lots of these, particularly the 'attack', start of a sibilant. There could well be a few noticeable characteristics, especially statistically over a lengthy message.
Whether this adds up to (anyone?) being capable of inferring accent? language..? can't say. If it's hard for a people to do, would have to be lots harder for a robot. It would be one big modelling and stat program (?)
Like say: grading diamonds via Tee Vee camera ?! in the visual sphere.
Sorry,
A.
|
Post #17,708
11/10/01 8:10:35 PM
|

The idea is what the feds are asking help with
if the language can be identified, the recording can be shipped to a linguist for translation. The req is for identifying specific languages by machine and trigger a data point for further investigation. thanx, bill
tshirt front "born to die before I get old" thshirt back "fscked another one didnja?"
|
Post #17,715
11/10/01 8:42:41 PM
|

I can appreciate the utility..
But in my friend trying to teach me a little of *correctly inflected* Russian (and I have a good ear for, voice for 'language sounds', generally) - you'd (maybe) not believe how Many ways you can (try to) say:
Tu\ufffd\ufffd grosnya kapitalistichiskaya sviny\ufffd! You filthy capitalist swine!
I know that the scope display of just one of these words (storage scope that is) would show the nasal, throat qualities as lo-freq. waveforms, with the sibillants as hi-freq modulation at start.. but -
Now add-in the mumblers, the couth-less, the local acc\ufffdnts.. Perhaps Ben's suggestions below - can tell you how near we might be. I'd opine that: if voice-recog. is up to querying a truly International glossary of sounds VS a valid sample of a message of more than a few words: your accuracy would reflect the state of the (Office) art.
I do recall that (couple years ago I think) there were some algorithms better suited for one-shot guesstimates / others for (the 'training' approach). The latter produced much higher overall accuracy (99% for deliberate slow speech?).
You can bet the Feds have had a chat with IBM and Kurtzweill (?) already. Can also bet - further improvement will Not come from a Billy, "writing neat tight C+ code"* (the pompous, arrogant snivelling Lying bastard). You can't code without a productive algorithm (right?)
* yeah the little prick actually Said that was "his hobby!" - got a link somewhere.
Luck, A.
|
Post #17,721
11/10/01 10:15:22 PM
|

how I would approach the problem
have a US groupie walk thru the bazaar in Quetta, Kabul, Medina with an open mike and record all the sounds. run software that will turn it into waves match these files against crowd noise in other countries. The aggregate will define those slop mouths, mumblers non native language speakers etc and hopefully an common identifier. If the general theory sounds interesting I will be putting in a bid. They have approached the usual suspects for this kind of thing but they have failed. It is now being presented to the garage inventors, the tinkeres, the IWETHEYers. If the one page concept will get flagged I will need a three page then the next step is a detailed outline followed by a contract for work. This is something we as a group could share in although not all might be in favor of that particular ability to pick out a language like that. thanx, bill
tshirt front "born to die before I get old" thshirt back "fscked another one didnja?"
|
Post #17,654
11/10/01 12:47:58 AM
|

It's a tough problem.
Current speech recognition software usually has to be trained (e.g. voice dictation stuff like IBM's Via Voice), and has a limited vocabulary.
[link|http://www.sensoryinc.com/|Sensory, Inc.] makes chips for voice activation and speech synthesis.
[link|http://www.wired.com/wired/archive/8.05/tpmap.html|Wired] has an article on universal translation issues.
And NSA and other government agencies probably have people working on this problem too...
In the problem you posed, you'd have to be able to distinguish between things like heavy dialects and sloppy grammar, and language differences. And just identifying the language is only the first step - after all you want to know what they're yammering, not just the language. :-)
I remember hearing a seminar from someone doing research for the Air Force in the late 1970s who was trying to figure out how to define "B-ness" as a first step toward designing a system to read printed text. Independent of the font. It's not trivial, but it's a problem that's pretty well solved now (for some values of "solved").
He also told of how people would say:
"Merry Mary got married" in different regions of the country. "Meery meery got meeried" and dozens of variations.
Think of the different ways people say "water".
I'm no expert, so take my thoughts with a grain of salt. :-)
Happy hunting!
Cheers, Scott.
|
Post #17,684
11/10/01 2:24:24 PM
|

Probably only generally
Voice recognition software at a more fundamental level takes speech and breaks it into recognizable phoenemes (ie specific chunks of sound that make up parts of syllables) and then puts them back together into syllables and words.
But you should be able to make a good guess as to what language is being spoken by just taking the raw phoenemes and doing a frequency analysis on them. That combined with some basic analysis of rhythmic patterns is probably what you do when you can tell that it sounds like someone you aren't listening to closely sounds like they are talking in German, Italian, etc.
Cheers, Ben
|