In network video conference, people hopes to be able to achieve the spokesman's close-up and the phonic video result with its synchronous sound. So, how does this one technology break through traditional artificial search pattern, let him equipment also can " find argue person " ?

The sound of talking person cannot by clearly reception of a radio, conference both sides listens the sound that is less than the other side, generation turn is intermittent perhaps phonic phenomenon makes both sides attending the meeting cannot understand each other thereby... although use video conference system now when, we often still encounter such problem. How as far as possible realistically the sound of environment of long-range emersion conference and image, give a person the sense with be personally on the scene, it is the first class task of sound processing domain all the time.

In domain of video conference product, chinese product takes video seriously to invent decipher and sound processing technology, euramerican product more emphasize soft hardware equipment, management to cover and as shirt-sleeve as unified communication framework research, and the view with the technical careful division of labor that Japanese product holds to to a string of 1000 cash firstly: Suo Ni is in monitor domain deep ploughing, yamaha handles domain spy in frequency.

A few days ago, japanese elegant Ma Ha drives PJP in China (Projectphone) make one's rounds of product of network conference series is exhibited, try to introduce another kind of completely different technique to solve train of thought for this domain. Through our newspaper reporter elegant to Japan horse breaths out Tian Wan of minister of career of company limited Sound Network the special interview of cropland of cereal of CTO of product of series of conference of network of China of Zhuo Ye, elegant Ma Ha, the following problem will get solution: Yamahahui uses what technology to solve the flaw that at present sound transmits, assist video function? How does the aggravating echo in video conference get settlement with the phenomenon that swallow a word? The main recording function in the conference, how should design? If why design network and video conference product with the train of thought of artificial intelligence, make press close to of achievement of science and technology of this one computer finally natural, accord with human habit?

Let a machine learn to find acoustical differentiate person

According to Introduction Gu Tian, how personate of system of conference of will new-style video, with demand of artificial intelligence of press close to, it is the concept of technology of newest network conference that series of PJP of elegant Ma Ha proposes. These concepts will be not amplitude acoustics through presenting technology, sound for an instant design, inside buy suits model a series of technologies such as echo arrester are able to reflect.

If armour ground and second ground hold long-range video meeting together, the person that armour ground personnel is clear that affirmatory second ground attends the meeting which is making a speech (video is tracked) very important. Handling this one issue, report of Cisco, north - the step that Baolitong takes bully energy of life is very: Make the net true assembly room of above of a 300 thousand dollar, hire exceeds wide bandwidth to transmit sound and image, with exceed big TV wall the person that reductive setting and ginseng meet 1 ground of 1 ∶ ; And Suzhou division amounts to the practice that waits for domestic company opposite economy is substantial: With hand of armour ground personnel medium remote controller controls second ground assembly room to photograph the roll that resembles a head and focal length to adjust, the person that search small indication screen not to install the ginseng below to meet or give out close-up. Overall for, what these two kinds of practices use is " person of the differentiate that find voice " means, but those who use is person eye search.
