Speech:Spring 2014 Colby Chenard Log


 * Home
 * Semesters
 * Spring 2014
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 4th, 2014
Jan 30: I am going to attempt to get as familiar with the system as possible. I will log in as root and attempt to navigate around caesar a bit and possibly try to run a train.
 * Task:

Feb 1: Logged in and read logs. Will attempt to run a train later tonight or tomorrow.

Feb 3: Created a full dictionary for a first 5hr train using genTrans5.pl and Eric's updateDict.pl script. Possibly attempt the first_5hr Train.

Feb 4: Logged in. Read Logs. Jan 30: Logged into Caesar root, but it appears our student accounts haven't been created yet, so I will have to wait on that before running a train. The notes specifically say not to run any commands as root. I was able to check what accounts were available by cding to /etc then using the cmd, 'more /etc/passwd
 * Results:

Feb 3: 1. Colby J and I ran the updateDict.pl and it seem to run without error, this was to our surprise so we took it a step further. 2. Then we tried to initiate a train and it seems to be running correctly!

We made as far as phase three, however it is a 6hr train, so we aren't out of the woods yet. This is a big milestone because we seem to finally have a functional understanding of the system

Now we can start to experiment with the acoustic modeling.

Feb 3: Work together with Colby Johnson
 * Plan:

1.Create a dictionary against our transcription file that was generated using getTrans5.pl 2.Using Experiment 0166 add2.txt file and Eric's updateDict.pl obtain a list of missing words from Dictionary and add to created dictionary. 3.Once we have a full dictionary list for first_5hr Train, attempt to run Train. Feb 3: Our main worry is that we don't know how to correctly use the updateDict.pl script, because the documentation is a bit vague. Eric also raises some concerns about the functionality of some optional params in the script.
 * Concerns:

Week Ending February 11, 2014
Feb 8: This week my goal is to run a few trains on my own as well as decode them to try and get a better grasp on how to start tuning parameters to improve our baseline.
 * Task:

Feb 9: Logged in.

Feb 10: Run my first train (first_5hr train).

Feb 11: Try to run a train again. Retrace my steps from yesterday to see what went wrong. Feb 10: I was able to run the train but errored out at step 7... Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once ) occurs in the phonelist (/mnt/main/Exp/0157/etc/0157.phone), but not in any word in the transcription (/mnt/main/Exp/0157/etc/0157_train.trans) ) occurs in the phonelist (/mnt/main/Exp/0157/etc/0157.phone), but not in any word in the transcription (/mnt/main/Exp/0157/etc/0157_train.trans) ) occurs in the phonelist (/mnt/main/Exp/0157/etc/0157.phone), but not in any word in the transcription (/mnt/main/Exp/0157/etc/0157_train.trans) ) occurs in the phonelist (/mnt/main/Exp/0157/etc/0157.phone), but not in any word in the transcription (/mnt/main/Exp/0157/etc/0157_train.trans)
 * Results:

Something failed: (/mnt/main/Exp/0157/scripts_pl/00.verify/verify_all.pl)

I think there may be an issue with my phone file. I will redo that tomorrow and see if it will pass step 7.

Feb 11: So Colby J and I ended up troubleshooting my issue together. We tried redoing all the steps. 1. Re ran genTrans but used v5 instead of v6 because there has been issues when using v6. That didn't fix it. 2. we added words to the dictionary from previous experiments... again no luck 3. Re created the phones and Feat.params as well as .filler and fileids....once again still fails on step 7. 4. But finally after multiple attempts we realized there was an error message saying it was only missing one word 'SH' WARNING: This word: SH was in the transcript file, but is not in the dictionary ([DEL: SHE HAD A FALL AND UH FINALLY UH SHE HAD UH     PARKINSON'S DISEASE AND IT GOT SO MUCH THAT SHE COULD NOT TAKE CARE OF      HER HOUSE SH THEN SHE LIVED IN AN APARTMENT AND UM :DEL] ). Do cases match? 5. We add that and re ran the train, and it worked! So hopefully it will run over night and then I can run a decode tomorrow.

Feb 8: I will attempt to run first_5hr train, since Colby and I have compiled a solid dictionary list for that. I also know that David was working on getting a better dictionary list, so maybe I can use that to run some other trains such as tiny or mini.
 * Plan:

Feb 11: Carefully examine and execute each step of running a train, to cut down on errors. Run a train successfully, as well as a decode. Feb 10: I have noticed that the wiki page on running a train could use some updating. For the first initial steps, those can be eliminated by running train_01.pl, as well as the next set of steps with train_02.pl. However I did these steps manually...For some reason I was getting this error.... Directory: /mnt/main/Exp/0157 miraculix Exp/0157> /mnt/main/scripts/train/scripts_pl/setup_SphinxTrain.pl -task 0157 Making basic directory structure Couldn't find executables. Did you compile SphinxTrain? ////Right here it's asking me if I compiled it. So not really sure what is happening miraculix Exp/0157> ls bin  bwaccumdir  etc  feat  logdir  model_architecture  model_parameters  wav
 * Concerns:

But when I just run train_01.pl all the dirs are created with the necessary files. So this will have to be examined further. I also noticed that it would be helpful to add a few steps to help future users run their first train more easily.... 1. Use the most up to date dictionary, which at this point is .0.7a 2. after creating a dictionary list in your Exp//etc/ dir then you need to prune it and after the prune you need to compile missing words from the dictionary using Erik's updateDict.pl script 3. Since I am doing the first_5hr I know that it is missing certain words, since Colby J and I have ran it before. So with Erik's script you pass two params. The master dictionary which is .0.7a and /mnt/main/0116/etc/add2.txt 4. That should give you a solid dictionary so that you can run first_5hr train without it erroring out, however it may still have errors but it will run successfully.

Week Ending February 18, 2014
Feb 15: logged on.
 * Task:

Feb 16: Logged on.

Feb 17: Because of last weeks failure, I would like to troubleshoot my errors and hopefully get passed them and run a successful train. In addition to this I would like to optimize parameters to achieve the lowest possible word error rate.

Feb 18: I ran a total of 6 trains all with slightly varied parameters. Experiments 0162,0164,and 0166 are 10hr trains. Experiments 0168, 0170 and 0182 are 5hr trains.
 * Results:

Colby J and I wanted to see if increasing the senone value a bit higher than the recommended value, and using varying densites would achieve better WER.
 * Mixtures:
 * Experiments 0162, 0164, 0166, had densities of 8,16, and 64.
 * Experiments 0168, 0170, 0182 had densities of 8,16, and 64 as well.
 * All 6 experiments used a senone value of 5000.

All the trains ran successfully. Then I started the decodes on them and they seemed fine so I left for work. But when I came back they error-ed out for some reason.

I tried to re run them but it tells me that I don't have permission to run them. This is very strange behavior, I will have to do some more investigation to find a work around or a solution.

Feb 17: After talking with Colby we decided the best way to tackle this is by collecting as much data as possible between the three of us all running different combinations parameters and comparing the results. So what I am going to do is run several 10 hour and 5 hour trains with varying densities with the hope that I will find something worth while to investigate further.
 * Plan:

Feb 17: It seems that there is much debate regarding the effects of senone values, that being how much they really effect the word error rate. After some research the general consensus in our group seems to be that there is a relationship between the Senone values and the size of the vocabulary in that the larger the vocabulary the higher the senone value. I would definitely like to run some more trains of my own to investigate this theory further. My concern with the senones is that the values could be to high and we could be over training. As a caveat to that our values aren't that much higher than the recommended values so there should not be to much of a difference. Vocabulary	 Hours in db	 Senones	 Densities	 Example 20	           5	          200	             8	         Tidigits Digits Recognition 100	           20	          2000	             8	         RM1 Command and Control 5000	         30	          4000	             16	         WSJ1 5k Small Dictation 20000	   80	          4000	             32	         WSJ1 20k Big Dictation 60000	   200	          6000	             16	         HUB4 Broadcast News 60000	   2000	  12000	             64	         Fisher Rich Telephone Transcription
 * Concerns:

Week Ending February 25, 2014
Feb 22: Logged in. Feb 23: Logged in.
 * Task:

24Feb2014 All work was done collaboratively with Colby C
 * Create a 100hr subset of the full data set
 * Learn about past sphinx training and decode parameters used
 * 
 * 
 * Attempt to run tests on the 10hr AMs using small data subset
 * Create graphs with Completed decode data

25Feb2014
 * We need to go back and try to re run the decodes again. We talked with David and found that we weren't doing it the correct way so we are going to go with his suggestions, and re run our decodes that way.

25Feb2014
 * Results:
 * So unfortunately our decodes failed... We think that because we tried to test on different data than what we trained against, it failed, but not inherently for that reason. We think it failed because we set it up wrong.

(Now we have a 100hr data set to train off of)
 * Plan:
 * Build 100hr data set from the full data set
 * Create 100hr Dir
 * 100hr
 * 100hr/train
 * 100hr/train/trans
 * 100hr/train/wav
 * Copy 1/3 of the text to a new txt file
 * Upload to server
 * Run copySph.pl to make symbolic links to the SPH files needed
 * /mnt/main/scripts/user/copySph.pl

25Feb2014 24Feb2014 Training:
 * So the correct way to do this is, to go through the entire process of running the train, without actually running it.
 * Things we need:
 * Dictionary, feats and language model.
 * Then we run a decode as we normally would but change the second parameter to the experiment # to the acoustic model that you would like to test off of.
 * So we will decode against 5hr/test data, as sort of a subset, but our training data(acoustic model) was built off of the 10hr corpus.
 * Concerns:
 * Do we need OOV (out of vocabulary) words in transcript or can they be removed
 * Find where inefficiencies lie in the training process

Decode:
 * Interpreting parameter names
 * Time...(paralellization)
 * Creating a decoding with smaller data sets

25Feb2014


 * The Future
 * One of my main concerns looking forward is optimization. Right now we are averaging about 15% to 30% error rates, and 15% is well over trained.


 * After some research Colby and I found that others before us have had much better results some even as low as 7%, so that in mind I would really like to find out what we can do to make our results more optimal.
 * I think we need to look at our dictionary and try to compile that better, however there are so many variables to account for so we need to try and make it less cumbersome.

Week Ending March 4, 2014
March 3rd:
 * Task:

This week was the perfect storm, Thursday and Friday I was out of town for training for work, then while working with Colby and David on Friday night, we managed to overload the server and shut it down for the weekend so that kind of slowed our roll. However looking forward it seems that Colby J has made some pretty valuable progress with parallelization. I would like to run a train using this to see if it works the way he says it does myself.


 * Results:

March 3rd:
 * Plan:
 * I would like to run 4 trains total:
 * A 5hr train without parallel processing, and default parameters
 * A 5hr with parallel processing, and default parameters
 * A 10hr without parallel processing, and default parameters
 * A 10hr with parallel processing, and default parameters

This way I will have solid data to compare... The hope is to prove that Colby's theory is correct. If that is the case then that will be a great step forward. March 4th:

Setting up a train for 100hr of data with a clean transcription file. So apparently after conversation 3170 there is little to no overlap in the audio files. This is because they changed their collection technique and what we have been noticing is that the overlap is causing our WER to be a lot higher than it should be so this is an attempt to see what 100hrs of training can yield.
 * Concerns:

Week Ending March 18, 2014
March 16rd:
 * Task:

Determine if using genTrans5 vs. genTrans8 makes a significant difference on the outcome of the word error rate while also using the new dictionary which is switchboard.dic. March 17rd:

March 17rd:
 * Create the LM
 * Run decode on 0212
 * Results:

Train ran successfully, hoping to get a successful decode.

March 18rd:

Mimicking Experiment 0209 we will generate a new acoustic model using the smae dictionary and parameters except using genTrans5.pl to compare the results.

Training: 32 Min Decode: RT = 1.08 WER: 33.8 SYSTEM SUMMARY PERCENTAGES by SPEAKER ,-. |                            hyp.trans                            | |-| | SPKR    | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | |-+-+-| |=================================================================|  | Sum/Avg | 4659  68616 | 82.2   12.7    5.1   16.0   33.8   94.1 | |=================================================================| |  Mean   | 58.2  857.7 | 82.0   13.1    4.9   17.4   35.4   94.8 | | S.D.   | 22.1  330.0 |  5.9    4.6    2.1    8.3   11.3    6.2 | | Median | 55.5  813.0 | 82.9   12.1    4.4   16.7   33.6   96.9 | `-' So this basically proves that genTrans8 is better than genTrans5 and that the new dictionary is more robust as well. Yesterday Colby J ran an experiment using the new dictionary 'switchboard.dic' which differs from our current best dictionary because it includes:
 * Plan:
 * All things in brackets:
 * Incomplete words
 * Laughter
 * Words that are difficult to make out, but they transcriber made a guess at the possibilities
 * Also doesn't include lexical stresses

March 17rd:

Run the decode using run_decode2.pl and score the results.

That in mind we want to see if all those key points effect the WER vs. the old dictionary. In Colby's experiment yesterday, 0209 he used genTrans8 so we want to use the new dictionary with genTrans5 to see if there is a difference. If the new dictionary does not provide a better WER, we have to go back to the drawing board.
 * Concerns:

If it in fact is not better we need to find a way to complete the old dictionary and improve it to get to where we need to be.

If we do find that the new dictionary is better than it will be big step in the right direction. Because this will mean that we need to determine what is better to have our transcription file


 * I.E.
 * having or not having laughter
 * incomplete words etc...

Week Ending March 25, 2014
March 23rd: Logged in. Read Logs.
 * Task:

March 24rd:


 * Prep for a train on the full set of data (308hr)
 * Extrapolated a dev set
 * Then the eval set

March 24rd:
 * Results:

Ran copySph.pl we had a few errors but after a couple sudo cmds Colby and I were able to get the directories all set up.

Now we are prepared to run the train which we will do tomorrow, while we monitor it closely.

March 24rd:
 * Plan:

Take the full 308hr transcription file and chop off the last 2hrs, which will give us our eval set. This will be our 'unseen' data that we will decode against. The hope is that by training on the full set of data we have a better overall train, in the hopes of getting a good decode on unseen data.
 * Concerns:

Week Ending April 1, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 8, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 15, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 22, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 29, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending May 6, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns: