TNA project : Multi-dimensional visualization of speech signal hierarchy: implementation



Acronym : 129-Multi-dimensional visualization of speech signal hierarchy implementation-Simko

Project Lead : Juraj Simko From : University of Helsinki

Dates : from 24th November 2013 to 29th November 2013

Description :



Motivation and objectives :
This project builds upon our recent work on representing and analyzing speech prosody using the continuous wavelet transform methodology (CWT). Wavelets emerged independently in physics, mathematics, and engineering, and are currently a widely used modern tool for analysis of complex signals including electrophysiological, visual, and acoustic signals. In particular, the wavelets have found applications in several speech prosody related areas: The first steps of the signal processing by the auditory periphery are well described by models that rely on wavelets; they are used in a robust speech enhancement in noisy signals with unknown or varying signal to noise ratio, in automatic speech segmentation, and in segregation along various dimensions of speech signal in a similar way as mel-cepstral coefficients. The multiscale structure of the wavelet transform has also been taken advantage of in musical beat tracking. The quantitative analysis of speech patterns through wavelets might also be relevant for understanding the cortical processing of speech. The speech signal contains information in a hierarchical fashion with temporal scales ranging from microseconds to several seconds. The advantage of using the scale invariant analysis such as CWT on a range of temporal scales is that the inherent hierarchical nature of the signal becomes visible. The analysis does not only reveal how information is distributed in time, but also the possible interdependencies between the levels as clearly visible tree structures. These are typically easy to interpret in terms of phonological linguistic hierarchies that relate to how linguistic information is distributed in the speech signals. The complex structure and its links with multiple levels of speech hierarchy, however, render the analysis of results challenging using traditional 2D visualization methodologies. We will develop a new multi-dimensional visualization method combining wavelet transform of both the original speech waveform and its envelope and the estimated f0 contour. The added dimensionality of the visualization will provide the means for the researcher to see the complexity of the speech signal in a new way and help him or her in formulating new hypotheses about the interconnections between the different types of information present in the signal. The new visualization methods will be based on the CWT based analyses projected to a 3D space using the latest methods in both holography and virtual reality. In parallel to the development of visualization methods, the results will be thoroughly evaluated; the evaluation process is subject to a parallel proposal "Multi-dimensional visualization of speech signal hierarchy: evaluation". By analyzing and representing the speech signal in multiple dimensions simultaneously should, in principle, provide more material for the human visual system to pick up patterns that we know are there from both a theoretical linguistic basis and the thousands of phonetic experiments on speech prosody that have used e.g., manipulated signals or simple operationalizations of fairly crude signal representations.

Teams :
The work is proposed by the SigMe (Multi-sensory signals and meanings at the University of Helsinki) group, which investigates the extent to which human comprehension of the environment, speech and language is grounded on multi-sensory interaction and action.

Dates :
starting date : 24 November, 2013
ending date : 29 November, 2013

Facilities descriptions :
http://visionair-browser.g-scop.grenoble-inp.fr/visionair/Browser/Catalogs/3DICC.HU.html

Recordings & Results :
The project will develop new multi-dimensional visualization methods based on wavelet transform of both the original speech waveform and its envelope and the estimated f0 contour. The added dimensionality of the visualization will provide the means for the researcher to see the complexity of the speech signal in a new way. It will help him or her in formulating new hypotheses about the interconnections between the different types of information present in the signal. The new visualization methods will be based on the CWT based analyses projected to a 3D space using the latest methods in both holography and virtual reality. This project is in the first stage run completely in parallel with project number 128 (Multi-dimensional visualization of speech signal hierarchy: evaluation)

Conclusions :
He will come again to SZTAKI 3DICC Lab on March 2014, to finish his project. Matlab implementation of cochlear modeling (cochleagram) was developed and tested. Several different speech stimuli ranking form natural utterances to controlled synthetics vowel sounds were used for visualization. Mark-up schemes were identified and evaluated with respect to the available visualization means at the facility. Due to time constrains the testing of the cave environment was not possible, nevertheless a possible mark-up for loading static 3D-images was discussed (VRML). We also familiarized ourselves with the current technology and researchers at the facility and established the channels of communication for the latter stage of our visit.



.



Visionair logo

VISIONAIR / Grenoble INP / 46 avenue Felix Viallet / F-38 031 Grenoble cedex 1 / FRANCE
Project funded by the European Commission under grant agreement 262044