TNA project : Multi-dimensional visualization of speech signal hierarchy: evaluation



Acronym : 128-Multi-dimensional visualization of speech signal hierarchy evaluation-Vainio

Project Lead : Martti Vainio From : University of Helsinki

Dates : from 22nd January 2014 to 29th January 2014

Description :



Motivation and objectives :
The speech signal is arguably the most complex signal in existence; be it biological or man made. The signal contains information in a hierarchical fashion with temporal scales ranging from microseconds to several seconds, perhaps longer. The signal is, moreover, a product of continuously moving vocal organs and thus contains virtually no static parts. Historically, the speech signal has been visualized by viewing separate signals, such as e.g., the waveform itself, estimated intensity contours, and the estimated fundamental frequency contours. These are all one-dimensional signals and the main two-dimensional means to visualize speech has been the spectrogram, which is based on Fourier transform. Fourier transform, in itself suffers from the uncertainty principle, in that one can not view simultaneously the formant structure produced by the changing resonance characteristics of the vocal tract and the fundamental frequency produced by the vibrating vocal chords. By analyzing and representing the speech signal in multiple dimensions simultaneously should, in principle, provide more material for the human visual system to pick up patterns that we know are there from both a theoretical linguistic basis and the thousands of phonetic experiments on speech prosody that have used e.g., manipulated signals or simple operationalizations of fairly crude signal representations. We have recently started work on representing and analyzing speech -- especially speech prosody -- using the (continuous) wavelet transform (CWT). The advantage of using a scale invariant analysis on a range of temporal scales is that the inherent hierarchical nature of the signal becomes visible. The analysis not only reveals how information is distributed in time, but also reveals the possible interdependencies between the levels as clearly visible tree structures. These are typically easy to interpret in terms of phonological linguistic hierarchies that relate to how linguistic information is distributed in the speech signals. However, viewing a dimensionally reduced slice hinders the possibilities to see patterns that might emerge from being able to view a common state space that has been calculated from two or more separate sources. That is, the hierarchy revealed by a CWT of the raw signal (the signal envelope) can be hypothesized to be joined with another hierarchy revealed by the fundamental frequency contour in an extremely interesting and informative ways. In sum, a multi-dimensional visual representation of the speech signal should resemble the parallel manner in which the brain handles these signals. The CWT can further be considered to be more akin to the signal that has been transformed by the cochlea for further analysis by the higher levels of the auditory system and resembles the data that the brain processes in order to make sense of the speech signal. This, we assume, is close to what the brain does with visual signals. In the proposed project, we will develop new multi-dimensional visualization methods based on wavelet transform of both the original speech waveform and its envelope and the estimated f0 contour. The added dimensionality of the visualization, we argue, will provide the means for the researcher to see the complexity of the speech signal in a new way and help him or her in formulating new hypotheses about the interconnections between the different types of information present in the signal. The new visualization methods will be based on the CWT based analyses projected to a 3D space using the latest methods in both holography and virtual reality. This project is closely linked to Dr. Juraj Simko's implementation project: Multi-dimensional visualization of speech signal hierarchy: Implementation (MDVSSH-I)

Teams :
The work is proposed by the SigMe (Multi-sensory signals and meanings at the University of Helsinki) group, which investigates the extent to which human comprehension of the environment, speech and language is grounded on multi-sensory interaction and action.

Dates :
starting date : 22 January, 2014
ending date : 29 January, 2014

Facilities descriptions :
http://visionair-browser.g-scop.grenoble-inp.fr/visionair/Browser/Catalogs/3DICC.HU.html

Recordings & Results :
The project aims to develop new multi-dimensional visualization methods based on wavelet transform of both the original speech waveform and its envelope and estimated f0 contour. The added dimensionality of the visualization will provide the means for the researcher to see the complexity of the speech signal in a new way. It will help him or her in formulating new hypotheses about the interconnections between the different types of information present in the signal. The new visualization methods will be based on CWT based analyses projected to a 3D space using the latest methods in both holography and virtual reality. This project is in the first stage run completely in parallel with project number 129 (Multi-dimensional visualization of speech signal hierarchy: implementation)

Conclusions :
He will come again to SZTAKI 3DICC Lab on March 2014, to finish his project. Matlab implementation of cochlear modeling (cochleagram) was developed and tested. Several different speech stimuli ranging from natural utterances to controlled synthetics vowel sounds were used for visualization. Mark-up schemes were identified and evaluated with respect to the available visualization means at the facility. Due to time constraints the testing of the cave environment was not possible, nevertheless a possible mark-up for loading 3D-images was discussed (VRML). We also familiarized ourselves with the current technology and researchers at the facility and established the channels of communication for the latter stage of our visit.




Few images :

Logo_Visionair.png
.



Visionair logo

VISIONAIR / Grenoble INP / 46 avenue Felix Viallet / F-38 031 Grenoble cedex 1 / FRANCE
Project funded by the European Commission under grant agreement 262044