A biomechanical model of the human tongue for understanding speech production and other lingual behaviors

Adam Baker
Department of Linguistics
University of Arizona

Directions for future research

What would I do if I had a linguistics, speech science, mechanical engineering, or physiology dissertation to write? A subset of the following:

Study the rest of the data output.

This could be easy for someone who knows what he's doing. I studied a subset of the exterior node positions of the model. That was only about the third of the nodes; there are probably generalizations to be made about the interior deformation of the tongue. And, all of the stress data are entirel untouched. Come to think of it, this might be more of a data visualization problem than anything...

Finish off the vocal tract.

The tongue is nice, but what we're really after is the acoustic output associated with muscle activation commands. Filling in the rest of the vocal tract would be reasonably simple. You could create a vocal tract area function without too much difficulty, and you could at least calculate the formants. (Not that I'm opposed to studying consonants, but that would be a bit harder.) I would use the Visible Woman myself, but you could also make a model from some MR or CT images.

Morph the mesh to match the geometry of a particular subject.

This has already been done in the literature with existing models. I'm not familiar with the morphometric techniques you would have to use, but I doubt that they are very difficult. You can just take the mesh I've used for my simulations and adjust the coordinates. Don't change the structure of the mesh, or else you'll have to figure out a way to make appropriate muscle assignments all over again. (I don't know if that's true: maybe you can just apply the morphometric algorithm to the volumetric data and go from there.)

Make the jaw and hyoid mobile (easier).

Find somebody with a point-tracking dataset with jaw and hyoid position. Warp that data (using anatomical landmarks) to match the Visible Woman dataset. Sample the point-tracking dataset to get good coverage of the range of jaw and hyoid motion. Use FEBio to apply prescribed nodal displacements to the jaw and hyoid attachments of the model. Then collect 128-sample datasets (corresponding to my D1) for each jaw/hyoid displacement. (Note: this is essentially what I did to test Halle's predictions, but that was just moving the hyoid, and it was only one muscle activation schema.) I showed that the 128-sample data set generalizes well to the 2086-sample set, so just collecting 128 postures from each jaw/hyoid position should be fine. For n jaw/hyoid positions, you'll only need to calculate 128n tongue postures. Easy if you can run the jobs in parallel. Then run a PCA on all the nodal coordinates (or a subset of them, as I did) and see what happens. Theoretically this should give you all of the possible tongue shapes that can be completed with the control model.

Make the jaw and hyoid mobile (harder).

Make an actual biomechanical model of the vocal tract, including all the soft tissue bits and all the muscles. Not easy. I imagine you could assume that the skeletal structures are immobile, aside from the jaw and hyoid. Pretty hard to create the anatomical model, though, and then very computationally expensive to run the simulation. The easy (well, easier) alternative would be to assume that the bones are point masses, acted on by Hill-type muscle actuators, and neglect the resistance to movement from the soft tissue. At that point, though, I'd just as soon have a phenomenological model of jaw and hyoid movement, as described above.

Use a better control model than I did.

I was limited to whole muscle activation (or half-muscle activation) because I didn't have data about the compartmental organization of the tongue. That research should be coming out soon (written Fall 2008), so keep an eye on the anatomy journals. If you can get a nice description of the compartments in tongue muscle, you can treat each of those compartments as a separate control parameter. At that point you should have a pretty good model of everything the tongue can do. The flip side, of course, is sampling the control space. Revenge of the Curse of Dimensionality. I don't know what to do about that.

All contents copyright © 2008 Adam Baker