Speech:Spring 2014 Brian Gailis Log


 * Home
 * Semesters
 * Spring 2014
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 4th, 2014

 * 2014.02.01 - Logged in-Details Below
 * 2014.02.02 - Logged in-Details Below
 * 2014.02.03 - Logged in-Details Below
 * 2014.02.04 - Logged in-Details Below


 * Task:
 * Maintain 4 logs
 * Change passwd in Caesar
 * Get SpEAK working locally on my machine
 * Review other logs, offer assistance where needed
 * Prepare notes for next meeting


 * Results:

  Options Indexes FollowSymLinks Includes ExecCGI   AllowOverride All   Order allow,deny   Allow from all     Alias /speak "F:/speak/php"
 * SpEAK page last update 2013.06.22
 * 1) It's remaining items:
 * 2) Search function needs work
 * 3) Admin function - requires implementation
 *  Checked out SpEAK code
 * 1) On my machine locally
 * 2) Verified TortoiseSVN was installed
 * 3) Created local download directory
 * 4) From Windows Eplorer, Right-Clicked newly created download folder
 * 5) Selected TortoiseSVN-CheckOut
 * 6) In the URL of Repository:  https://speak.googlecode.com/svn/trunk/
 * 7) In the Checkout directory: verified path to newly created download folder
 * 8) Checkout Depth: Fully recursive
 * 9) Revision: HEAD revision radio button selected
 * 10) Clicked OK Button
 * 11) All of SpEAK downloaded to my machine locally
 *  Configured my local XAMPP to work with SpEAK
 * 1) Created backup of httpd.conf file \xampp\apache\conf\2014.02.01.httpd.conf
 * 2) Edited \xampp\apache\conf\httpd.conf
 * 3) Added directory entry defining root path of speak/php/index.ini
 * 4) I use a Windows machine, had to change backslach to forward slash in path name
 *  Added an Alias to access the directory more quickly
 * 1) Created backup of httpd.conf file \xampp\apache\conf\2014.02.01.httpd.conf
 * 2) Edited \xampp\apache\conf\httpd.conf
 * 3) Added Alias entry allowing my browser to be simplified URL


 *  Created the SpEAK database
 * 1) Started XAMPP
 * 2) browsed to xampp root directory
 * 3) started xampp-control.exe
 * 4) Started Apache
 * 5) Started MySQL
 * 6) Created SpEAK database##
 * 7) Opened command prompt
 * 8) changed directory to xampp\mysql\bin
 * 9) logged into mysql  mysql.exe -u root -p
 * 10) From the mysql prompt created the database
 * 11) SOURCE f:\createspeakuseranddb.sql
 * 12) Created the database tables
 * 13) SOURCE f:\createspeaktables.sql
 *  Started SpEAK page
 * 1) Opened Chrome
 * 2) Specified URL: http://localhost:8080/speak/login.php (my apache is configured for port 8080)
 * 3) SpEAK home page opened
 * 4) Entered the user name as defined in the createspeakuseranddb.sql
 * 5) Entered the password as outlined from Josh's log regarding the pword
 * 6) Logged into SpEAK no trouble

2014.02.02
 * 1) Attempted to access SpEAK via caesar.unh.edu
 * 2) Opened command prompt on my local public machine
 * 3) Used ping to verify public ipaddress for caesar.unh.edu
 * 4) Response back: 132.177.189.63
 * 5) Opened local browser
 * 6) Entered URL: caesar.unh.edu/speak
 * 7) Could not connect
 * 8) Entered URL: caesar.unh.edu
 * 9) Could not connect
 * It appears that caesar is not accessible via HTTP protocol
 * 1) Tried the secure HTTPS protocol and that did not work either
 * Logged into caesar via ssh
 * 1) Followed similar steps as Josh from his 2/1/2014 log but connected to the methusalix machine as noted under
 * 2) Using the documentation from speech->semester->spring2014->groups->experiment group->Assigned machines are: methusalix & verleihnix
 * 3) mySQL nor is apache installed on methusalix
 * 4) attempted to connect to the machine that Josh's logs identify miraculix
 * 5) using the same method as Josh was able to login to that machine
 * 6) navigated the same directories
 * 7) was able to login to my ftp sever from the miraculix /mnt/main/srv/www/vhosts/speak directory
 * 8) to answer one of Josh's questions, How do I tadd files from y machine to there?
 * 9) One method is to use FTP put/get to move stuff
 * 10) However, we should all be working with the same file set so a better way might be to use the repository and checkin/checkout files when needed...
 * 11) to answer the other q is the code repo the sam?, I don't think it matters because we're all going to use the same code set moving forward, the one identified in google

2014.02.03
 * Reviewed the SpEAK documentation further in hope to find other good nuggets
 * 1) Review from SpEAK home page
 * 2) Selected Semester
 * 3) Selected 2012
 * 4) Reviewed the different areas
 * 5) Prof Jonas has comments in several pages requesting more information
 * 6) For Example http://foss.unh.edu/projects/index.php/SpEAK:Spring_2012_Notes_Summary
 * 7) Here, he comments "Details of how to access code base and deployment environment can be accessed and how the information is organized would be very helpful..."
 * 8) The System Design Document identifies more tables that what the create db script defines, Josh may have a point regarding a starting point and code repos, might be worth looking further into....
 * 9) For example, http://foss.unh.edu/projects/index.php/SpEAK:Spring_2012_SDD
 * 10) Curious, there are many references to something called STEM and there is even a public page for it:
 * 11) http://stem.unh.edu/speak/login.php
 * 12) Tried the login but couldn't login and I'm not able to locate a login for it...  very curious, more investigation may be needed here....

2014.02.04
 * Reviewed other logs
 * 1) Spoke with Mike, and we tested google hangout for on-line communication
 * 2) We also talked about the group not using Google Groups but rather use wiki-media group logs


 * Plan:
 * 1) Review previous logs and SpEAK pages
 * 2) Checkout SpEAK Code
 * 3) Configure SpEAK locally on my machine
 * 4) Verify SpEAK works and what it currently does


 * Concerns:
 * 1) The code is a mess and poorly written, can use a lot of clean up
 * 2) Documentation is scattered and difficult to follow
 * 3) Not enough direction for strong collaborative team efforts (As of yet)

Week Ending February 11, 2014

 * 2014.02.08 - Logged in- Details below
 * 2014.02.09 - Logged in- Details below
 * 2014.02.10 - Logged in- Details below
 * 2014.02.11 - Logged in- Details below
 * Task:
 * 1) Review Speech: Spring 2014 Experiment Group http://foss.unh.edu/projects/index.php/Speech:Spring_2014_Experiment_Group
 * 2) Review Speech: Training http://foss.unh.edu/projects/index.php/Speech:Training
 * 3) Review Speech: Spring 2014 Proposal http://foss.unh.edu/projects/index.php/Speech:Spring_2014_Proposal
 * 4) Review Individual logs
 * 5) Identify areas of concern and update accordingly (See Concerns Below)


 * Results:


 * 2014.02.08
 * Speech: Spring 2014 Experiment Group
 * 1) Reviewed the items outlined under Experiment Group Immediate Goals
 * 2) Followed the bullet point for running  a train on an experiment (NOTE: did not actually run a train, just followed the instruction)


 * Speech: Training
 * 1) Read
 * 2) Found the outline a bit confusing, it does a good job of a step by step procedure but there are very little very, if any, notes that explain what it is that the user is trying to accomplish
 * 3) Also, the instructions make assumptions that the user is already logged in and being on the proper machine and at the right location
 * 4) * This is an important thing to note as a beginner is not going to have a clue about any of this stuff and will blindly follow the instruction but won't have a clue about what it is they're doing and why, provided they even get it to work if they're miraculously beginning at the right place.


 * Speech: Spring 2014 Proposal
 * 1) Browsed the proposal
 * 2) Nothing has been defined for the experiment group


 * Review Individual logs
 * 1) Josh had identified several concerns regarding SpEAK
 * 2) Both Ray and Pauline are attempting to run trains

* Environment: Windows Professional 7 * Access: Remote * Protocol: SSH * Client Application: OpenSSH * Client Application: Putty could not create directory /home/bgailis/.ssh The authenticity of host caesar.unh.edu (132.177.189.63 can't be established
 * 2014.02.09
 * Updated the Team's Group Log and added a Team Member Schedule
 * Attempted to run a train
 * 1) Steps to run a train:
 * 2) Opened Windows Command Prompt
 * 3) Changed Directory to C:\Program Files\OpenSSH\bin
 * 4) From Command Prompted typed: ssh btj9@caesar.unh.edu
 * 5) Received error mgs stating:

RSA key fingerpint is Areyou sure you want to contiune connecting (yes/no) Failed to add the host to the list of know hosts (/home/bgailis/.ssh/known_hosts)
 * 1) Typed: yes
 * 2) Error msg:
 * 1) Prompted for password btj9@caesar.unh.edu's password:
 * 2) Entered password
 * 3) System pompted back with last login and returned me to prompt caesar sp14/btj9
 * 4) At this point, followed the Steps for running a Train found on http://foss.unh.edu/projects/index.php/Speech:Training
 * 5) cd /mnt/main/Exp
 * 6) ls
 * 7) verified the last known experiment number 0152
 * 8) mkdir 0153
 * 9) cd 0153
 * 10) /mnt/main/root/tools/SphinxTrain-1.0/scripts_pl/setup_SphinxTrain.pl -task 0153
 * 11) cd etc
 * 12) Created a back up of the sphinx_train.cfg file before editing
 * 13) cp sphinx_train.cfg sphinx_train.cfg.bak
 * 14) ls to verify the bak file was created
 * 15) vi sphinx_train.cfg
 * 16) got an error mgs stating: E437: terminal capability “cm” required
 * 17) a consideration of install ncurses-term for the OS team?
 * :set nu (this enables line numbers in vi)
 * 1) Attempted to edit but kept getting strange responses
 * :q! (exited vi and reverted to back up)
 * 1) rm sphinx_train.cfg
 * 2) cp sphinx_train.cfg.bak sphinx_train.cfg
 * 3) started researching the issues with VI and editing and during my research Caesar remote host kicked me out..
 * 4) at this point I abandoned the use of OpenSSH and reverted to PuTTy
 * 5) PuTTy did not give me the same errors as OpenSSH and offered the color display, which is terrible!
 * 6) vi sphinx_train.cfg (edits file)
 * :set nu (enables line numbers)
 * :6 (moves to the 6th line of the file)
 * 1) l (moved right to "train1")
 * 2) x (deleted train1)
 * 3) i (inserted 0153)
 * 4) ESC (move out of insert)
 * 5) j (move to down to line 7)
 * 6) h (moved  to start of "/root...")
 * 7) x (deleted all text between quotes)
 * 8) i (inserted "/mnt/main/Exp/0153"
 * 9) h (moved  to start of "/root...")
 * 10) x (deleted all text between quotes)
 * 11) i (inserted "/mnt/main/Exp
 * 12) ESC (move out of insert)
 * :80 (move to line 80)
 * 1) i (move into insert mode, type hash mark at start of line)
 * 2) ESC (move out of insert)
 * 3) k (move up one line to line 79)
 * 4) x (delete the hash at start of line)
 * 5) ESC (to ensure out of any weirdness)
 * :x (to save changes and exit)
 * 1) UP ARROW (recalls last command at prompt, in this example called vi sphinx_train.cfg)
 * 2) ESC (to ensure out of any weirdness)
 * :1 (verifiy changes on lines 6 through 8)
 * 1) ESC
 * 2) q! (to quit)
 * 3) The instructions for "Generate the transcript and its associated audio-file list." are next
 * 4) Found these to be extremly confusing so here's what I did
 * 5) /mnt/main/scripts/user/genTrans6.pl /mnt/main/corpus/switchboard/mini/train 0153
 * 6) got a ton of error messages stating: "sox FAIL formats: can't open output file wav/temp.wav: No such file or directory
 * 7) cd.. (had to change back one directory)
 * 8) /mnt/main/scripts/user/genTrans6.pl /mnt/main/corpus/switchboard/mini/train 0153
 * 9) File ran without error
 * 10) cd etc (moving to etc directory)
 * 11) /mnt/main/scripts/train/scripts_pl/pruneDictionary2.pl 0153_train.trans /mnt/main/corpus/dist/cmudict.0.6d 0153.dic
 * 12) The previous step for pruneDictionary took forever!!!!
 * 13) cp -i /mnt/main/root/tools/SphinxTrain-1.0/train1/etc/train1.filler 0153.filler
 * 14) cp -i /mnt/main/scripts/user/genPhones.csh.
 * 15) ./genPhones.csh 0153
 * 16) vi 0153.phone
 * :set nu (looked for the line number starting with S, in my test, like 56 started with T and line 55 started with SH. Since SIL needs to be added, a new line 56 is needed for the insert of SIL)
 * 1) O (inserted a line above the current line and puts user into insert mode)
 * 2) SIL (inserted the character SIL on line 56)
 * 3) ESC (escaped out of insert mode)
 * 4) J (move down one line, verified order)
 * :x (save changes and exit)
 * 1) cd .. (move back one dire to exp base directory)
 * 2) /mnt/main/scripts/train/scripts_pl/make_feats.pl -ctl /mnt/main/Exp/0153/etc/0153_train.fileids
 * 3) A bunch of .sph file types were created under /mnt/main/Exp/0153/wav/sw*.sph
 * 4) /mnt/main/scripts/train/scripts_pl/RunAll.pl
 * 5) Response back: Something failed: (/mnt/main/Exp/0153/scripts_pl/00.verify/verify_all.pl)
 * 6) The instructions state that most scripts fail the first time and this is normal?  That's crap, failure should never be normal....
 * 7) lynx 0153.html
 * 8) There were a lot of WARNING messages
 * 9) Went to the http://foss.unh.edu/projects/index.php/Speech:Training#Issue_1:
 * 10) This area refences The Training not finding words referenced in the transcript within the dictionary file
 * 11) following the instructions view the missing words, I was not able to locate the list as the instructions reference but instead read through each log entry
 * 12) There has to be an easier way...  Maybe introduce a Parser?
 * 13) exited the program as I was at the end of a train and nothing more to do except append to the dictionary followed by rinse and repeat...


 * 2014.02.10-Reviewed Logs
 * 2014.02.11-Reviewed Logs


 * Plan:
 * 2014.02.08-Review Logs
 * 2014.02.09-Try a train
 * 2014.02.10-Review Logs
 * 2014.02.11-Make updates and corrections as identified from other's logs and previous work from the 8th, 9th, and 10th
 * Concerns:


 * 2014.02.08
 * Speech: Spring 2014 Experiment Group
 * 1) No concerns with the goals, they follow as identified from Wednesday's meeting (2/5)
 * 2) One concern thus far, the projects overall direction, I don't see how the next semester is going to push the project forward but that might be due to my ignorance as it is still early in the semester to see the ultimate goal


 * Speech: Training
 * 1) Found the outline a bit confusing, it does a good job of a step by step procedure because the instructions make assumptions that the user is already logged in and is on the proper machine at the right location
 * This is an important item to note because a beginner is not going to have a clue about any of this stuff and will blindly follow the instruction.


 * Speech: Spring 2014 Proposal
 * 1) Experiment group has been in contact with Modeling group and there are several open items concerning automation, this maybe the reason the Experiment group has yet to contribute to the Semester Proposal and the experiment group is waiting on confirmation from modeling


 * Review Individual logs
 * 1) No concerns here as of yet
 * 2) Pauline and Josh's efforts I think are paying off and will provide the experiment group with good direction moving forward


 * 2014.02.09
 * Running a train following instructions: http://foss.unh.edu/projects/index.php/Speech:Training
 * 1) At step 2 it gives instruction to use the last expirment number +1, however it does not give you instruction on how to do that, when writing the automation script, this will need to be an item to cosider, doing a directory read and pulling the last known (max) value and adding 1
 * 2) on step 4 it gives a command and offers little explanation to what's actually occuring.  The user must go on blind faith that what their doing is correct, maybe adding a link to the page that actually explains what's occuring would be useful?
 * 3) Step 2 of the set up the Sphinx Train Configuration File. It might be a better idea to introduce a backup of sphinx_train.cfg before actually editing it
 * 4) Creating a backup: cp sphinx_train.cfg sphinx_train.cfg.bak
 * 5) I know from my own experience with VI, it's not simple and depending on the terminal used, you may not get nice line numbers...  GEDIT might be a better solution for those who are more comfortable with a GUI text editor
 * 6) Using vi I got an error message stating: E437: terminal capability “cm” required
 * 7) This is caused by the terminal set in dumb mode, installing additional software will help to avoid this error
 * 8) a consideration of install ncurses-term for the OS team?
 * 9) yum install ncurses-term
 * 10) The instructions call for using VI as the editor and give a link for common VI stuff but for those who have no interest in learning VI then I think offering some hints with vi at each step will save the user time when editing directly
 * 11) for example, the instruction make reference to editing lines 6 through 8 and 79 & 80 but VI by default does not include line numbers and that feature has to be enabled, including instruction on enabling line numbers first will help with the remaining steps
 * 12) I found the instructions for "Generate the transcript and its associated audio-file list." very confusing
 * 13) These instrcutions make note of things not to do
 * 14) Then provides examples of what could be done but says don't use them
 * 15) Then in the commands to actually use it references paths that should be replaced but doesn't actually tell you what those paths are or where/how to get them
 * 16) The instructions state to define a Corpus subset and gives you a location of where to find them and a bunch of words that tell you not to use this or that... very confusing!!
 * 17) The instruction set "Set up the Sphinx Train Configuration file:" leaves the user in the .../etc directory while the instruction set for "Generate the transcript and its associated audio-file list." requires you to start back one directory at the experiment level- be sure to move back one level befor generating the transcript
 * 18) Perhaps updating the instruction set "Set up the Sphinx Train Configuration file:" with the last item to move the user back one directory, which ensure the user is starting the next phase at the proper location (I.E. directory)
 * 19) The "Generate the phone list." instruction should end with change directory back one "cd.." as it leaves you in .../etc and you need to be in exp base dir for the next step Generate Feats data.
 * 20) The "Start the Train!" instructions end with "Please note: Trains will usually fail the first time executing RunAll.pl!" ... Mmmm. nuff said...
 * 21) Identifying missing words is painful, we may want to think about a better parser or possibly find out how the html file is produced and re-engineer that to produce a list instead of gobbly gook....


 * 2014.02.10-Reviewed Logs, no concerns at this time
 * 2014.02.11-Reviewed Logs, no concerns at this time

Week Ending February 18, 2014

 * 2014.02.16 - Logged in- Details below
 * 2014.02.17 - Logged in- Details below
 * 2014.02.18 - Logged in- Details below


 * Task:
 * 1) Learn the structure of the experiment directory
 * 2) Log findings under personal log
 * 3) Where things are
 * 4) How they are stored
 * 5) What does sphinx create when it runs a train
 * 6) Collaborate with group at next meeting regarding findings
 * 7) Update logs accordingly


 * Results:
 * 2014.02.16
 * 1) Updated Brian time line for:
 * 2) Speech:Spring 2014 Proposal Group (section)
 * 3) http://foss.unh.edu/projects/index.php/Speech:Spring_2014_Proposal_Group
 * 4) Read the Speech Home Page
 * Moved to the Information page
 * 1) Moved to the Speech Software Functionality
 * 2) Found this page to be very useful and good place to start as it identifies all areas involved
 * Server named caesar with an OS: openSUSE11.3
 * The server has 5 software packages installed:
 * Sphinx Decoder
 * CMU Language Model Toolkit
 * Sphinx Trainer
 * CMU Dictionary
 * SCLITE
 * Moved to the System Software Setup page
 * This page is a review of the OS and the general concerns regarding the OS and its versions
 * 1) Moved to the Hardware Configuration page
 * 2) Did not find this page particularly useful as it only discusses the hardware
 * 3) Moved back to home page
 * 4) MOved to Experiments page
 * 5) Selected Experiment 1
 * 6) Shows a redirect to April 24th Group 1
 * 7) Selected redirect
 * 8) Under the Group Log, there is a note from Prof. Jonas stating:
 * 9)  Ah yes, finally we are getting to some really important issues. This is going to be the hard part...teasing out Sphinx dependencies based on their file hierarchy to get it working under ours...this may not be easy.
 * 10) Although this page was useful in that it identified some trouble areas, I was not able to locate any further direction...
 * 11) Moved back to the home page
 * 12) Then to Semesters
 * 13) Then to 2011
 * 14) Then to 2011 proposal page
 * 15) Found this page to be the most useful page so far as it outlines the start of Capstone and what's going on
 * 16) In the Building models section, information is given revealing the early foundations of the 2013 semester's Experiment group tasks
 * 17) Some things to note:
 * 18) Building models requires several steps
 * 19) Switchboard data needs to be orgranized into a suitable format
 * 20) Subsets get created, one of which is a proof of concept, referred to as a Mini set
 * 21) The remaining subsets are workable baseline sets of models, referred to as a Full set
 * 22) Sphinx (the software being used for this project) needs to be configured to generate acoustic models in a batch mode while synchronizing all machines in the Caesar stack
 * 23) The last step is to use Perl scripts to automate the expirment process
 * Switchboard corpus - hundreds of hours of overseas phone converation in native English
 * This is the data used to generate models during the training phase
 * Switchboard transcriptions - used as a base line to parse words from text transcription file that is then compared to the dictionary file
 * Training - comparison of words pulled from a transcript and compared against a dictionary
 * Mini Switchboard train set - a small portion of an audio file (typically an hour) that is used to create a transcript, the transcript is then "trained" where words are pulled out and compared to a dictionary
 * Full Train Set - 90% of a Switchboard corpus
 * Test set - set of data similar to a train set, except not used during training of a model, used to judge the accuracy of models during decoding
 * Dev set - used to tune the decode
 * Consists of 5% of the full Switchboard corpus not used in creating the train set
 * Dev Mini Test Set - 30 minute subset of the 5% Dev set, used for testing models created during training
 * Eval set - used after against the ending result of the Dev set
 * Perl Scripts - data manipulation tools, used to automate the text parsing of Switchboard data
 * Parse transcriptions from Switchboard to Sphinx
 * Call on an application to down sample audio files
 * Generate new experiment directories according to the experiment directory structure
 * CMU Pronunciation Dictionary - speech recognition dictionary, found at www.speech.cs.cmu.edu/cgi-bin/cmudict
 * Scoring - The output from Sphinx compared to an actual transcription of the audio
 * the Sphinx training process - http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html - documentation on the sphinx training system
 * Probaly the most useful page I've yet to come accross, which isn't in UNHM Speech wiki.... hmmm....


 * 2014.02.17
 * NOTED AND DOCUMENTED VALID INFO FROM EVERY STUDENT LOG FROM Speech:Spring 2011:
 * Chris: http://foss.unh.edu/projects/index.php/Speech:Spring_2011_Chris_Log
 * 1) /SphinxTrain1 - where stuff is installed
 * 2) Compile SphinxTrain - ./autogen.sh
 * 3) Tutorial setup - perl scripts_pl/setup_tutorial.pl an4
 * 4) SphinxBase & Sphinx3 set up - /autogen.sh (SVN), ./configure (Tar Ball)
 * 5) Compile SPHINX-3 - ./autogen.sh --prefix=`pwd`/build --with-sphinxbase=`pwd`/../sphinxbase (SVN), configure --prefix=`pwd`/build --with-sphinxbase=`pwd`/../sphinxbase (Tar Ball)
 * 6) Running Trainer - perl scripts_pl/make_feats.pl -ctl etc/an4_train.fileids, perl scripts_pl/RunAll.pl
 * 7) Decoding it - perl scripts_pl/make_feats.pl -ctl etc/an4_test.fileids, perl scripts_pl/decode/slave.pl
 * 8) /root/speechtools/SphinxTrain-1.0/time/etc/ - transcriptions (etc/time.transcriptions)
 * 9) /root/speechtools/SphinxTrain-1.0/time/etc/ - sph files in wav
 * 10) /root/speechtools/SphinxTrain-1.0/train1/genFileIDs_withoutPath.sh - script to generate the fileids file to generate your features.


 * Matt: http://foss.unh.edu/projects/index.php/Speech:Spring_2011_Matt_Log
 * Matt notes that he found a file labeled Tinydoc.txt that explains almost how the whoe training process works.
 * Tinydoc.txt /mnt/main/root/tools/SphinxTrain-1.0/doc
 * This isn’t flawless and errors will show. That is how a basic Train is run
 * 1) You start out by making a new directory ( mkdir time)
 * 2) then you go into that folder.
 * 3) Once in there your run (perl $SPHINXTRAINDIR/scripts_pl/setup_SphinTrain.pl -task “name of directory you created”
 * 4) The main conifg file is put ino etc/sphinx_train.cfg
 * 5) You will put the wav files that are needed ino the wav/ file that is now in the structure in the new file that you created.
 * 6) nIn the file etc/”name of folder”.fileids you put the list of all the wav iles that are needed into it like wav0001 wav0002 wav0003 etc..
 * 7) Then in the file etc/”name of folder”.phone you enter all the phone names like AA EE AE etc…
 * 8) Next you have a word transcription of each file which is under etc/”name of file”.transcription each one must end in (FILEID)
 * 9) an example is THE TIME IS NOW … (WAV0001)
 * 10) Next you have a etc/”name of file”.filler This has a list of the filler words and there pronunciation (using the phones)
 * 11) example is SIL SIL SIL /NOISE/ +NOISE+
 * 12) After this we will need a dictionary which will be created from the switchboard files.
 * 13) Considering we are doing this on our own we will need to use bin/make_dict and etc/”name of file”.transcription
 * 14) This will then create the files etc/word.known etc/word.unknown
 * 15) Once we like the dictionary that we created run the command mv etc/word.known etc/”name of folder”.doc
 * 16) Then we make the melcep feature files with the command (perl scripts_pl/make_feats.pl –ctl etc/”name of folder”.fileids
 * 17) Now we can start on the basic perl scripts.
 * 18) Their results will be put in perl_”name of folder”.html which we can view as things progress.

for i in /root/speechtools/SphinxTrain-1.0/train1/wav/*.wav ; do echo ${i%%.wav} | sed 's#^.*/##'; done
 * Brian: http://foss.unh.edu/projects/index.php/Speech:Spring_2011_Brian_Log
 * 1) /root/speechtools/SphinxTrain-1.0/readme.txt -  installation instructions
 * 2) /root/speechtools/SphinxTrain-1.0/doc/tinydoc.txt - Brief illustration of install process
 * 3) miraculix:~/decodeFiles - dictionary
 * 4) http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html - information
 * 5) http://www.isle.illinois.edu/sst/courses/minicourses/2009/lecture1.pdf - Described process of converting sph to wav.
 * 6) sox sw02001.sph sw02001.wav trim 0 00:50 - convert a section of the sph file to wav.
 * 7) sw02001.sph is the sph filename, sw02001.wav is the output filename, 0 is the start time, and 00:50 is the end time
 * 8) Sphinxtrain config file - specifies two modes of training:
 * 9) first mode, Sphinxtrain uses only the local machine (Queue::Posix in sphinx_train.cfg)
 * 10) second mode, Sphinxtrain uses a PBS/Torque queue which is a server that distributes work among multiple computers
 * 11) require the installation of a PBS/Torque server on one of the systems to distribute work between all of the systems
 * 12) Torque - is a batch job system (http://www.democritos.it/activities/IT-MC/documentation/newinterface/pages/runningcodes.html & http://www.bc.edu/offices/researchservices/cluster/torqueug.html)
 * 13) Torque How To - http://wiki.hpc.ufl.edu/doc/TorqueHowto
 * 14) Carnegie Mellon University tutorial  SPHINX system- http://www.speech.cs.cmu.edu/sphinx/tutorial.html
 * 15) training acoustic models using the Sphinx3 trainer - http://www.speech.cs.cmu.edu/sphinxman/fr4.html
 * 16) sphinxtrain -  feature file
 * 17) generate the fileID file:
 * 1) !/bin/sh
 * 1) Just pipe the output to a file with the extension fileid and you should be good.
 * 1) make_feat.pl - script specifies to use sph files
 * 2) generates a folder called ___BASE_DIR___
 * 3)  wav folder inside ___BASE_DIR___ folder - all of the sph files
 * 4) an4 script - doesn't actually generate anything
 * 5) training files - Phones will be treated as case sensitive.


 * KC: http://foss.unh.edu/projects/index.php/Speech:Spring_2011_KC_Log
 * 1) information on using new models - http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4/doc/UsingSphinxTrainModels.html
 * 2) to use new models (three steps):
 * 3) Defining a dictionary and a language model
 * 4) Defining a model and a model loader
 * 5) Configure a frontend (optional)
 * 6) SphinxTrain README.txt -  referred the reader to other website
 * 7) CMU LM switchboard - uses .wav files that are sampled at 8kHz
 * 8) Sphinx software - requires a .wav file that is sampled at 16kHz
 * 9) Training Acoustic Model For CMUSphinx - http://cmusphinx.sourceforge.net/wiki/tutorialam?s
 * 10) etc/feat.params - Configure Sound Feature Parameters
 * 11) etc/sphinx_decode.cfg - Configure Sound Feature Parameters
 * 12) Once changes have been made:
 * 13) copy the sphinx_decode.cfg file from the an4 directory to the Capstone directory
 * 14) edit the file and change any file names from an4 to the project name
 * 15) Converting sph audio files to wav - http://library.rice.edu/services/dmc/guides/linguistic/converting-sph-audio-files-to-wav
 * 16) CMU Switchboard transcription files - are in .sph format


 * James: http://foss.unh.edu/projects/index.php/Speech:Spring_2011_James_Log
 * 1) svn on caesar - done using ssh tunneling, a feature that will result in the best overall security for caesar
 * 2) experiment directory structure - created by Scott Innes on April 3rd, 2011
 * 3) //media/data/Switchboard  - mini training set and a mini development set
 * :/home/linux/Documents/timeFiles.pl - find the full length of the data with a high degree of accuracy
 * 1) accurate subsets of the data can now be constructed for training using this updated length measurement


 * Nick: http://foss.unh.edu/projects/index.php/Speech:Spring_2011_Nick_Log
 * 1) language model - can be built from both a Vocabulary data set and ID 3-Grams or text can be used to build the Vocabulary data and the ID 3-Grams
 * 2) change_log.txt -  tools
 * 3) Firstly, the text file is converted to a word frequency file (.wfreq) with the following command. ‘../bin/text2wfreq < change_log.txt > change_log.txt’.
 * 4)  contains a list of every word in from the file with a count indicating how many occurrences were found
 * 5) Next, a Vocabulary file is created from the word count file with the following command. ‘../bin/wfreq2vocab < change_log.wfreq > change_log.vocab’
 * 6) This file contains a list of every unique word from the word frequency file.
 * 7) Id 3-gram file -  the vocabulary or text file is necessary
 * 8) vocabulary file that was created to make an ID 3-gram file - ‘../bin/text2idngram –vocab change_log.vocab > change_log.idngram’
 * 9) seemed to hang on me for more than 10 minutes -  ‘Allocating memory for the n-gram buffer’.
 * 10) Language Model can be built with - ‘../bin/idngram2lm –idngram change_log.idgram –vocab change_log.vocab –binary change_log.binlm’
 * 11) requires the idgram file and vocab file
 * 12) ms98_icsi_word.text - contains transcripts from switchboard
 * 13) Create an ID 3-gram file - http://foss.unh.edu/projects/index.php/Speech:Spring_2011_Nick_Log - Week Ending April 5th, 2011
 * 14) /media folder -  parsing script - to convert the words from the transcriptions into vocab files.
 * 15) Create a word frequency file - ~/speechtools/CMU-Cam_Toolkit_v2/bin/text2wfreq  trans.wfreq
 * 16) Create a vocab file -  ~/speechtools/CMU-Cam_Toolkit_v2/bin/wfreq2vocab  trans.vocab
 * 17) Create an id 3 gram -  ~/speechtools/CMU-Cam_Toolkit_v2/bin/text2idngram -vocab trans.vocab -n 3  trans.idngram
 * 18) specify n gram of 3.
 * 19) create a language model with arpa and binary format
 * 20) ~/speechtools/CMU-Cam_Toolkit_v2/bin/idngram2lm -idngram trans.idngram -vocab trans.vocab -arpa trans.arpa
 * 21) ~/speechtools/CMU-Cam_Toolkit_v2/bin/idngram2lm -idngram trans.idngram -vocab trans.vocab -binary trans.binlm
 * 22) create a trans folder to contain all the items mkdir trans
 * 23) make a tarball of the files tar -cp trans.* trans.tar
 * 24) send it to caeser under the media folder sftp 192.168.10.1 put trans.tar /media
 * 25) The files are on Caeser under /media/data/trans
 * 26) The language model files are the trans.arp and trans.binlm
 * 27) /media/data/trans/CreateLanguageModelFromText.perl:
 * 28) language model script - http://foss.unh.edu/projects/index.php/Speech:Spring_2011_Nick_Log - Week Ending April 19th, 2011
 * 29) To execute: perl CreateLanguageModelFromText.perl inFile outFile
 * 30) ParseTranscript.perl has to be in the same directory as this script
 * 31) This script will need to change or branched to receive a vocab file


 * Corey: http://foss.unh.edu/projects/index.php/Speech:Spring_2011_Corey_Log
 * 1) sphinx dictionary - http://www.speech.cs.cmu.edu/sphinx/models/hub4opensrc_jan2002/cmudict.06d
 * 2) dictionary" directory on idefix
 * 3) original transcripts - ms98_icis_word.text
 * 4) dict.perl - is the perl script
 * 5) uniq_words.txt - the list of unique words from the original transcripts.
 * 6) create_uniq_words.sh - bash shell script - used to create "uniq_words.txt"


 * Scott: http://foss.unh.edu/projects/index.php/Speech:Spring_2011_Scott_Log
 * 1) Given the nature of the work we are doing...  it seems that a convenient way would be to use a web application that uses PHP and SQL
 * 2)  Mike-jonas 14:30, 2 April 2011 (UTC): not working with a database but with a directory structure containing data
 * 3) http://www.perl.org - great starting point to get to know PERL
 * 4) Brian (bmq9) wrote a script and put up on Foss that - Takes the transcript, parses a line, creates a directory for audio/transcript and pulls out the audio for that conversation
 * 5) Experiment directory layout is a little different than it, but shouldn't be that hard to fix. It then parses and tries to fix that piece of the conversation. it's written in perl
 * 6) perldoc.perl.org - reference documentation
 * 7) "Learning Perl" from Safari books - pretty basic
 * 8) use the chomd unix command to have the system recognize the script as a program
 * 9) ExpDir.pl -  ExpDir.pl onto methusalix, perl ExpDir.pl runs it..... WORTH REVIEWING THIS FILE?
 * 1) ExpDir.pl -  ExpDir.pl onto methusalix, perl ExpDir.pl runs it..... WORTH REVIEWING THIS FILE?


 * Speech:Summer 2011
 * Brian Avery: http://foss.unh.edu/projects/index.php/Speech:Summer_2011_Brian_Log
 * 1) Wrote training guide - uploading files to foss ????
 * 2) the language model does not rely on the dictionary
 * 3) sclite under sctk - Speech Recognition Scoring Toolkit (SCTK) Version 2.4.0 - http://www.itl.nist.gov/iad/mig//tools/
 * 4) the language model - is where the probability of a specific word occurring in a specific situation is set up (a certain word is more likely to occur after a word than another word would be)
 * 5) language model information: http://en.wikipedia.org/wiki/Language_model
 * 6) That document appears to explain what each of the executables do.
 * 7) convert text to a word frequency file - text2wfreq filename.txt
 * 8) create a vocab file from word ount - wfreq2vocab wfreqfile.wfreq
 * 9) create 3-gram file using vocab or text - text2idngram -vocab vocab file > filename.idngram
 * 10) build language model - idngram2lm -idngram idngramfile.idngram -vocab vocabfile.vocab -binary filename.binlm
 * 11) languageModel/etc - copied stripped transcript
 * 12) to generate wfreq -text2wfreq  transcript.wfreq - Appears to be a count of the number of times that each word appears in the text.
 * 13) to generate vocab - wfreq2vocab  transcript.vocab - Appears to be a dictionary of the words in the text\
 * 14) to generate idngram - text2idngram -vocab transcript.vocab -idngram transcript.idngram
 * 15) text2idngram - text2idngram -vocab transcript.vocab -idngram transcript.idngram < transcript.text
 * 16) langauge model - idngram2lm -idngram transcript.idngram -vocab transcript.vocab -binary transcript.binlm
 * 17) Attempted a decode
 * 18) Copied the language model up to etc
 * 19) command: sphinx3_decode -hmm model_parameters/train1.cd_cont_1000/ -lm etc/transcript.binlm -dict etc/train1.dic -fdict etc/train1.filler -ctl etc/train1_train.fileids -cepdir wav -cepext .sph > ~/decodeOutput.txt
 * 20) decode script: see the decode section of Brian Avery's Summer 2001 log towards the bottom, lots of detail here on the script and decoding


 * Jeff Knight: http://foss.unh.edu/projects/index.php/Speech:Summer_2011_Jeff_Knight
 * 1) Nothing really useful here, talks mostly about what he did with NFS Mount but not much detail on anything


 * Speech:Summer 2011 Training
 * http://foss.unh.edu/projects/index.php/Speech:Summer_2011_Training
 * 1) Well documented notes on training, step by step procedure
 * 2) Most of the procedure here has since been updated however, it's a good place to learn a bit about what Capstone is all about


 * 1) Reviewed Speech:Spring 2014 Experiment Group
 * 2) http://foss.unh.edu/projects/index.php/Speech:Spring_2014_Experiment_Group
 * 3) Read through the goals of the group and determined that I'm ignoring anything regarding scripts and automation at this time as it has nothing to do with this weeks immediate goal of learning the experiment directory.
 * 2014.02.18


 * Plan:
 * 2014.02.16
 * 1) Review of Speech:Spring 2014 Proposal Group
 * 2) http://foss.unh.edu/projects/index.php/Speech:Spring_2014_Proposal_Group
 * 3) Update logs
 * 2014.02.17
 * 1) Review of Speech:Spring 2014 Experiment Group
 * 2) http://foss.unh.edu/projects/index.php/Speech:Spring_2014_Experiment_Group
 * 3) Read through and documented all of the efforts of the 2011 year, see results above
 * 4) Update logs
 * 2014.02.18
 * 1) Update logs
 * Concerns:
 * 2014.02.16
 * 1) A lot of time passes between class and group discussions which makes working this project difficult.
 * 2) Seems each week our direction changes and we lose focus.
 * 3) Finding information is difficult as there is no flow to the Speech site and a user stumbles through it without direction or purpose
 * 2014.02.17
 * 1) The Speech:Spring 2014 Experiment Group page, which identifies the groups goals, seems to focus heavily on scrips and automation.  This may be important to the overall goal but not too helpful to meet our immediate needs, such as defining the environment of an experiment, the who, what, when and where of an experiment.  To me, this page is a bit misleading towards that goal and perhaps our group could discuss it further?
 * 2) The 2011 year has lots of good foundation information and had written most of the first gen scripts, however, for whatever reason most of the scripts shown in their wiki's lack location and file names.  Perhaps most or all of this work has been replaced but it would have been nice to see the names of files.  Recommendation, ask Prof Jonas to force a template when posting script files that includes: script file name, location, and purpose.
 * 2014.02.18
 * 1) No concerns

Week Ending February 25, 2014

 * 2014.02.19: Wiki Table: http://en.wikipedia.org/wiki/Help:Table
 * 2014.02.20: Install Sphinx local: http://www.speech.cs.cmu.edu/sphinx/tutorial.html - NEXT: Setting up the data
 * 2014.02.20: Install Sphinx local: http://www.speech.cs.cmu.edu/sphinx/tutorial.html - How to perform a preliminary decode
 * 2014.02.24: Install Sphinx local: http://www.speech.cs.cmu.edu/sphinx/tutorial.html - Complete
 * 2014.02.25: Review of Josh's logs, review of my own logs


 * Once you have made all the changes desired, you must train a new set of models. You can accomplish this by re-running all the slave*.pl scripts from the directories /scripts_pl/00* through /scripts_pl/09*, or simply by running perl scripts_pl/RunAll.pl.


 * The above will create a new setup by rerunning the SphinxTrain setup, then rerunning the decoder setup using the same decoder as used by the originating setup. It then copies the configuration files, which are located under etc, to the new setup, with the file names matching the new task's.


 * The the copy_setup.pl script also copies the data, located under feat and wav to the new location. If dataset is large, the duplication may be wasting disk space. Editing the script and creating a Symbolic link to the data is a better option but symbolic linking is not supported in all operating systems, for example, Windows does not support symbolic linking.


 * Task:
 * 1) Define common terms


 * Results:
 * 1) Read through and follow instructions from the Carnegie Mellon University: http://www.speech.cs.cmu.edu/sphinx/tutorial.html site
 * 2) This site really explains what Capstone is all about and gives detailed explanation into everything


 * Plan:
 * 1) Define common terms in the structure of a table as it's easier than bullet points
 * 2) Create a virtual machine (vm) on my local machine
 * 3) Install OpenSUSE on my vm machine
 * 4) Follow the Carnegie Mellon University instructions for completing an experiment locally on my own machine
 * Concerns:
 * 1) Disorganization and lots of it
 * 2) Finding even my own logs to be overwhelming, need to be better organized

Week Ending March 4, 2014

 * 2014.03.02 - Logged in to review other's activity
 * 2014.03.03 - Logged in to review the progress on the Master Script as well as SpEAK
 * 2014.03.04 - Logged in, reviewed logs


 * Task:
 * Work on defining content
 * Up date areas as needed.


 * Results:
 * Looking forward to seeing the work Josh has done with the Master Script
 * The rest of the team is making headway so all is well with the world!
 * This week I was not able to contribute anything and sorry to my team for that!


 * Plan:
 * Read additional logs and post important related material in on location


 * Concerns:
 * Wasn't able to do much of anything this week due to other priorities
 * Will need to devote double time next week

Week Ending March 18, 2014

 * Task:


 * Results:


 * SPRING BREAK


 * Plan:


 * Concerns:
 * SPRING BREAK and during this time will be on vacation from university projects

Week Ending March 25, 2014

 * 2014.03.22 - Logged in Updated Tasks
 * 2014.03.23 - Logged in Reviewed Team Logs
 * 2014.03.24 - Logged in Reviewed Josh Anderson's log specifically for how to use his script
 * 2014.03.25 - Logged in and attempted the master script again

1. Update task assignments for this week 2. Review other team members logs 3. Run an experiment using the newly created scrip file written by Josh Anderson (Spring 2014 Semester) 4. Try running another experiment after correcting errors
 * Task:
 * 2014.03.22
 * 2014.03.23
 * 2014.03.24
 * 2014.03.25

1. was able to update all areas of my log, Tasks, Plan, and Concerns 1. reviewed other Experiment Group logs to figure out where they are and how I could contirbute
 * Results:
 * 2014.03.22
 * 2014.03.23
 * The group seems well on their way performing individual tasks, however both Josh and Pauline have mentioned content additions and clarifications. The group may need to get together to establish a unified plan on presentation.  We should probably get together on this at the end of the next class?
 * 2014.03.24
 * The following was performed from the following environment:


 * The following is the procedure for running an expirement from  master_run_train.pl script

1. Updated the screen shot for figure 19 2. Followed the script again but still got failure when attempting to RunAll.pl
 * 2014.03.25

3. Not sure what's going on here, will need to follow up with group tomorrow on what the heck the above error is...

1. Update task assignments for this week
 * Plan:
 * 2014.03.22
 * Organize this weeks log entries
 * Develop plans to achieve task completion for this week

2. Review other team members logs
 * 2014.03.23
 * Review Josh's, Pauline's, and Rays Logs

3. Run an experiment using the newly created scrip file written by Josh Anderson (Spring 2014 Semester)
 * 2014.03.24
 * After review of Josh's logs, follow his instruction for using the script to generate an experiment

4. Try Josh's script again using the suggestions given for success
 * 2014.03.25
 * Tried the script again changing step 19
 * Concerns:
 * 2014.03.22
 * 1) Free Time: I have several major ongoing projects in my professional life as well as with the Internship course project.  It is highly unlikely that I'll be able to maintain the save effort level as I had in the first part of the semester
 * 2014.03.23
 * 1) Content: A shared concern among the experiment group is organization and clarification of the data we've been adding to speech wiki.  We will need to get together, probably next class, to discuss this further.
 * 2014.03.24
 * 1) Script failed due to user error, passed suggestions to Josh regarding my step 19 and Step 23 for the RunAll.pl example and path
 * 2014.03.25
 * 1) Script still failed when attempting RunAll.pl.. will need to follow up with group to figure out the trouble...


 * 2014.03.26
 * 1) on step 19 the example provided for 5hrTrain is not correct.  the syntax is: first_5hr/train

Week Ending April 1, 2014

 * 2014.03.27 - Logged in (detail below)
 * 2014.03.28 - Logged in (detail below)
 * 2014.03.29 - Logged in (detail below)
 * 2014.03.30


 * Task:
 * 2014.03.27
 * 1) Update tasks for Week Ending April 1, 2014
 * 2) Add content to the Experiment Setup page
 * 2014.03.28
 * 1) Update the Experiment page with the experiments that I've run
 * 2014.03.29
 * 1) Update the Experiment Setup page http://foss.unh.edu/projects/index.php/Speech:Exp
 * 2014.03.30
 * 1) Review other's logs and assist where I can


 * Results:
 * 2014.03.27
 * 1) Added content from my log to the Terminal page under experiment setup page
 * 2014.03.28
 * 1) Added content to experiment numbers:
 * 2) Mike Jonas had sent me an email requesting that I update the experiment labels for experiment log entries of 0234 & 0235.  He also asked that since these were not true experiments that I label them as accident.  Corrected the labels per Mike's request.
 * 2014.03.29
 * Updated the Experiment Setup page http://foss.unh.edu/projects/index.php/Speech:Exp
 * Added the Capstone Terms page
 * Added the External Resources and Utilities page
 * 2014.03.30


 * Plan:
 * 2014.03.27
 * 1) Update tasks for Week Ending April 1, 2014
 * 2) Add content to the Experiment Setup page
 * 2014.03.28
 * 1) Update the Experiment page with the experiments that I've run
 * 2014.03.29
 * 1) Update the pages specified in my tasks lists
 * 2014.03.30
 * 1) Review other's logs and assist where I can
 * Concerns:
 * 2014.03.27
 * 1) None
 * 2014.03.28
 * 1) There is a lot of redundant information within the speech wiki and the overall site is becoming more cluttered with redundant information.  For example, the details of the testing that I've done are now duplicated both here in my log as well as in the experiments page.  This is the primary reason for new students being overwhelmed when they enter into Capstone.
 * 2) Defining experiment log entries as Accident is not beneficial to any viewer of the experiment log.  I think labels should be descriptive of activity in the way I had labeled them but... I'll do what's been asked and follow a non-descriptive label convention.
 * 2014.03.29
 * 1) No concerns regarding the Experiment page
 * 2014.03.30

Week Ending April 8, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 15, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 22, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 29, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending May 6, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns: