Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition

Jun Wang; Seongjun Hahm; Ted Mau

Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition

Jun Wang, Seongjun Hahm, Ted Mau

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Articulatory data have gained increasing interest in speech recognition with or without acoustic data. Electromagnetic articulograph (EMA) is one of the affordable, currently used techniques for tracking the movement of flesh points on articulators (e.g., tongue) during speech. Determining an optimal set of sensors is important for optimizing the clinical applications of EMA data, due to the inconvenience of attaching sensors on tongue and other intraoral articulators, particularly for patients with neurological diseases. A recent study found an optimal set (tongue tip and body back, upper and lower lips) on tongue and lips for isolated phoneme, word, or short phrase classification from articulatory movement data. This four-sensor set, however, has not been verified in continuous silent speech recognition. In this paper, we investigated the use of data from sensor combinations in continuous speech recognition to verify the finding using a publicly available data set MOCHA-TIMIT. The long-standing speech recognition approach Gaussian mixture model (GMM)-hidden Markov model (HMM) and a recently available approach deep neural network (DNN)-HMM were used as the recognizers. Experimental results confirmed that the four-sensor set is optimal out of the full set of sensors on tongue, lips, and jaw. Adding upper incisor and/or velum data further improved the recognition performance slightly.

Original language	English (US)
Title of host publication	SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings
Editors	Jan Alexandersson, Ercan Altinsoy, Heidi Christensen, Peter Ljunglof, Francois Portet, Frank Rudzicz
Publisher	Association for Computational Linguistics (ACL)
Pages	79-85
Number of pages	7
ISBN (Electronic)	9781941643792
State	Published - 2015
Event	6th Workshop on Speech and Language Processing for Assistive Technologies, SLPAT 2015 - Dresden, Germany Duration: Sep 11 2015 → …

Publication series

Name	SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings

Conference

Conference	6th Workshop on Speech and Language Processing for Assistive Technologies, SLPAT 2015
Country/Territory	Germany
City	Dresden
Period	9/11/15 → …

Keywords

Articulation
Deep neural network
Dysarthria
Electromagnetic articulograph
Hidden Markov model
silent speech recognition

ASJC Scopus subject areas

Language and Linguistics
Computer Science Applications
Signal Processing
Linguistics and Language

Cite this

Wang, J., Hahm, S., & Mau, T. (2015). Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition. In J. Alexandersson, E. Altinsoy, H. Christensen, P. Ljunglof, F. Portet, & F. Rudzicz (Eds.), SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings (pp. 79-85). (SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings). Association for Computational Linguistics (ACL).

Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition. / Wang, Jun; Hahm, Seongjun; Mau, Ted.
SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings. ed. / Jan Alexandersson; Ercan Altinsoy; Heidi Christensen; Peter Ljunglof; Francois Portet; Frank Rudzicz. Association for Computational Linguistics (ACL), 2015. p. 79-85 (SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Wang, J, Hahm, S & Mau, T 2015, Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition. in J Alexandersson, E Altinsoy, H Christensen, P Ljunglof, F Portet & F Rudzicz (eds), SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings. SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings, Association for Computational Linguistics (ACL), pp. 79-85, 6th Workshop on Speech and Language Processing for Assistive Technologies, SLPAT 2015, Dresden, Germany, 9/11/15.

Wang J, Hahm S, Mau T. Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition. In Alexandersson J, Altinsoy E, Christensen H, Ljunglof P, Portet F, Rudzicz F, editors, SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings. Association for Computational Linguistics (ACL). 2015. p. 79-85. (SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings).

Wang, Jun ; Hahm, Seongjun ; Mau, Ted. / Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition. SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings. editor / Jan Alexandersson ; Ercan Altinsoy ; Heidi Christensen ; Peter Ljunglof ; Francois Portet ; Frank Rudzicz. Association for Computational Linguistics (ACL), 2015. pp. 79-85 (SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings).

@inproceedings{e462d8576b5f40a2822e823540cf3297,

title = "Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition",

abstract = "Articulatory data have gained increasing interest in speech recognition with or without acoustic data. Electromagnetic articulograph (EMA) is one of the affordable, currently used techniques for tracking the movement of flesh points on articulators (e.g., tongue) during speech. Determining an optimal set of sensors is important for optimizing the clinical applications of EMA data, due to the inconvenience of attaching sensors on tongue and other intraoral articulators, particularly for patients with neurological diseases. A recent study found an optimal set (tongue tip and body back, upper and lower lips) on tongue and lips for isolated phoneme, word, or short phrase classification from articulatory movement data. This four-sensor set, however, has not been verified in continuous silent speech recognition. In this paper, we investigated the use of data from sensor combinations in continuous speech recognition to verify the finding using a publicly available data set MOCHA-TIMIT. The long-standing speech recognition approach Gaussian mixture model (GMM)-hidden Markov model (HMM) and a recently available approach deep neural network (DNN)-HMM were used as the recognizers. Experimental results confirmed that the four-sensor set is optimal out of the full set of sensors on tongue, lips, and jaw. Adding upper incisor and/or velum data further improved the recognition performance slightly.",

keywords = "Articulation, Deep neural network, Dysarthria, Electromagnetic articulograph, Hidden Markov model, silent speech recognition",

author = "Jun Wang and Seongjun Hahm and Ted Mau",

note = "Funding Information: This work was supported by the National Institutes of Health (NIH) through grants R03 DC013990 and R01 DC013547. We would like to thank Dr. Jordan R. Green, Dr. Ashok Samal, and the support from the Communication Technology Center, University of Texas at Dallas. Publisher Copyright: {\textcopyright} SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings.; 6th Workshop on Speech and Language Processing for Assistive Technologies, SLPAT 2015 ; Conference date: 11-09-2015",

year = "2015",

language = "English (US)",

series = "SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings",

publisher = "Association for Computational Linguistics (ACL)",

pages = "79--85",

editor = "Jan Alexandersson and Ercan Altinsoy and Heidi Christensen and Peter Ljunglof and Francois Portet and Frank Rudzicz",

booktitle = "SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings",

address = "United States",

}

TY - GEN

T1 - Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition

AU - Wang, Jun

AU - Hahm, Seongjun

AU - Mau, Ted

N1 - Funding Information: This work was supported by the National Institutes of Health (NIH) through grants R03 DC013990 and R01 DC013547. We would like to thank Dr. Jordan R. Green, Dr. Ashok Samal, and the support from the Communication Technology Center, University of Texas at Dallas. Publisher Copyright: © SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings.

PY - 2015

Y1 - 2015

N2 - Articulatory data have gained increasing interest in speech recognition with or without acoustic data. Electromagnetic articulograph (EMA) is one of the affordable, currently used techniques for tracking the movement of flesh points on articulators (e.g., tongue) during speech. Determining an optimal set of sensors is important for optimizing the clinical applications of EMA data, due to the inconvenience of attaching sensors on tongue and other intraoral articulators, particularly for patients with neurological diseases. A recent study found an optimal set (tongue tip and body back, upper and lower lips) on tongue and lips for isolated phoneme, word, or short phrase classification from articulatory movement data. This four-sensor set, however, has not been verified in continuous silent speech recognition. In this paper, we investigated the use of data from sensor combinations in continuous speech recognition to verify the finding using a publicly available data set MOCHA-TIMIT. The long-standing speech recognition approach Gaussian mixture model (GMM)-hidden Markov model (HMM) and a recently available approach deep neural network (DNN)-HMM were used as the recognizers. Experimental results confirmed that the four-sensor set is optimal out of the full set of sensors on tongue, lips, and jaw. Adding upper incisor and/or velum data further improved the recognition performance slightly.

AB - Articulatory data have gained increasing interest in speech recognition with or without acoustic data. Electromagnetic articulograph (EMA) is one of the affordable, currently used techniques for tracking the movement of flesh points on articulators (e.g., tongue) during speech. Determining an optimal set of sensors is important for optimizing the clinical applications of EMA data, due to the inconvenience of attaching sensors on tongue and other intraoral articulators, particularly for patients with neurological diseases. A recent study found an optimal set (tongue tip and body back, upper and lower lips) on tongue and lips for isolated phoneme, word, or short phrase classification from articulatory movement data. This four-sensor set, however, has not been verified in continuous silent speech recognition. In this paper, we investigated the use of data from sensor combinations in continuous speech recognition to verify the finding using a publicly available data set MOCHA-TIMIT. The long-standing speech recognition approach Gaussian mixture model (GMM)-hidden Markov model (HMM) and a recently available approach deep neural network (DNN)-HMM were used as the recognizers. Experimental results confirmed that the four-sensor set is optimal out of the full set of sensors on tongue, lips, and jaw. Adding upper incisor and/or velum data further improved the recognition performance slightly.

KW - Articulation

KW - Deep neural network

KW - Dysarthria

KW - Electromagnetic articulograph

KW - Hidden Markov model

KW - silent speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85042695906&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85042695906&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85042695906

T3 - SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings

SP - 79

EP - 85

BT - SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings

A2 - Alexandersson, Jan

A2 - Altinsoy, Ercan

A2 - Christensen, Heidi

A2 - Ljunglof, Peter

A2 - Portet, Francois

A2 - Rudzicz, Frank

PB - Association for Computational Linguistics (ACL)

T2 - 6th Workshop on Speech and Language Processing for Assistive Technologies, SLPAT 2015

Y2 - 11 September 2015

ER -

Determining an optimal set of flesh points on tongue, lips, and jaw for continuous silent speech recognition

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this