Speech:Models AM Build


 * Home
 * Semesters - Project Work by Semester
 * Information
 * Experiments - List of speech experiments

Project Notes

 * Unix Notes
 * Active Directory
 * Backups
 * Network Bridge
 * Speech S/W Installation
 * Speech Corpus Setup
 * Switchboard Data Notes
 * Experiment Setup
 * Scripts Page
 * [Model Building]
 * Step 1: Run a Train
 * Step 2: Create the Language Model
 * Step 3: Run the Decode

Model Building
General discussion on how to build and verify models including the initial setup and preparation of data, building of a statistical language model and finally generating a robust set of acoustic models (training) and verifying them by testing (decoding) on the trained corpus. For detailed steps on how to train and decode, see the sub-steps under Model Building above.


 * Data Preparation
 * Language Modeling
 * [Building & Verifying Models]
 * GenTrans script

Building and Verifying Acoustic Models

 * Building acoustic model
 * Building language model

The above links have more details about training and decoding acoustic models. fill this in with information

The task at hand is to create an acoustic and language model by setting up a "trainer" and "decoder".

Building an Acoustic Model
A mini train and decode was completed several times with different data following these steps. The purpose of this task is to take conversations saved in the .wav format and their transcripts to be able to create a speech recognition tool. The trainer grabs the .wav files, phonemes dictionary, dictionary, and transcript of the conversations. It then matches up the audio with the transcript. In order for the trainer to do this it needs a dictionary with every word that's in the transcript. Then it needs an accurate phoneme dictionary with every word that is in the dictionary.

Verifying an Acoustic Model
The decoder is necessary to check to see if the trainer actually worked and how accurate it is. This is completed by running one script, run_decode.pl


 * The table described above lists all of the scripts used to run the trainer. Some of the scripts, as shown above, have different scripts that are called. The location of these scripts are listed in the table
 * The main directory of the scripts is located in /speechtools/SphinxTrain-1.0/taskName (e.g. train1)
 * Executable files are also called by the scripts
 * Location of the executable files is in /mnt/main/local/bin/
 * make_feats.pl calls the sphinx_fe executable
 * Inside of Runall.pl it calls a script, slave_align.pl which is located in the /04.vtln_align/ directory
 * slave_align.pl calls the sphinx3_align executable file


 * Decode
 * There are a couple of scripts used to build the language model in order to decode
 * lm_create.pl is used, this script calls the sphinx_lm_convert executable file
 * This webpage has detailed information on how a basic trainer works. Obviously ours is configured for us so this isn't exactly how we run the trainer or decoder. However, it does have a lot of detailed information that could be useful for people to understand how it works.
 * http://cmusphinx.sourceforge.net/wiki/tutorialam