Speech:Spring 2014 Sinisa Vidic Log


 * Home
 * Semesters
 * Spring 2014
 * Proposal
 * Report
 * Information - General Project Information
 * Experiments - List of speech experiments

Week Ending February 4th, 2014

 * Plan:
 * Task:
 * Read Wiki documentation
 * Read all the logs from the System Group members of last years Capstone class.
 * Research into Fedora 20 for possible OS change for Caesar and all the droids.
 * Familiarize myself with UNIX/Linux OS and commands.

1/30
 * Results:
 * Read Logs
 * Tried to login into Caesar earlier today but was unsuccessful. I'm assuming the accounts haven't been created yet will try to login tomorrow.

2/2
 * Downloaded and installed Virtual Box on to my local computer.
 * Downloaded and installed Fedora 20 (64-bit) on to the Virtual Box.
 * Learned some UNIX/Linux commands with the help of Google and the Speech:Unix page.
 * Used the terminal in the Fedora 20 to SSH into Caesar with my account. Changed my password with command  and created the keygen for my account with command.
 * SSH into Automatix from Caesar without the need to use a password after keygen.

2/3
 * Read Logs
 * Read Help:FAQ and Help:Project_wiki_documentation

2/4
 * Downloaded and installed openSuse 13.1 (64-bit) in order to get the feel of it compared to Fedora 20
 * Download was 4.3 GB compared to only 953 MB for Fedora 20
 * At first glance Fedora seems to have a lot less software installed on it compared to bunch of games and financial software that openSuse comes with. This could be good or bad, good in the sense that it probably comes with a lot of software that Sphinx needs to run properly. Bad is that a lot of this software will be useless for our needs and a lot of time will be spent uninstalling it in order to save Disk space and computer performance.
 * More research will need to be done in order to come to a conclusion if in fact it is worth to upgrade to a newer OS version or to a complete new OS in Fedora. I plan on researching Sphinx next week which will give me a better picture into what requirements are needed to run it.


 * Concerns:
 * Having so little knowledge of UNIX/Linux OS it is hard to compare the two on bases of which will be more suitable for our needs without actually installing Sphinx(need more research) and running it on both OS

Week Ending February 11, 2014

 * Task:
 * Read Logs from last years class concerning Fedora upgrade.
 * Read info on running trains and creating experiments
 * Work on the Systems Group portion of the Capstone proposal

2/8
 * Results:
 * Read logs

2/9
 * Read logs
 * Done some research on the KeyGen issue on Fedora for Rome without success. Still waiting for access to Rome as of right now my account on it hasn't been created.

2/10 2/11
 * Worked on the proposal
 * Contacted Valerie Therrien and  Arwa Hamdi in regards to tasks assigned.
 * Updated Systems Group log to include the time line and task responsibilities for the group. (All dates are approximated)
 * Googled some more for solutions to our Fedora keygen issue with no luck.
 * Began to set up local Virtual Box to simulate our Caesar and Rome machines.
 * Spent a lot of time trying to find a solution to the Fedora keygen issue.
 * Setting SSH capabilities on the openSuse and Fedora Virtual Boxes was simply and easy. Installed OpenSSH with command  technically it only updated the already installed version that Fedora came pre-installed with.
 * Started the sshd service with  and that allowed the openSuse box to connect to Fedora using SSH.
 * Created the rsa keys on openSuse with  made a copy of the id_rsa.pub and named it authorized_keys. Command used
 * Tried to upload authorized_keys file to the same user (viper) on the Fedora using  but it didn't work. So I transfered the file by old method using a "flash drive".
 * Tried login in to Fedora but it still asked for the password.
 * Disabled "PasswordAuthentication" in the /etc/ssh/sshd_config file on Fedora. Restarted the SSH service with  and tried to login from openSuse. This time an error showed up Permission denied (publickey,gssapi-with-mic). Googled a lot but was unable to find a working solution.
 * Went a step further and attempted to recreate a full Caesar/Rome network simulation. Read a lot of tutorials on how to setup NFS Server/Client. With a lot of trial and error I was able to mount a openSuse /home directory on to Fedora to act as a shared user directory.
 * Unfortunately I made a critical error by accidentally removing the Fedora's root files thus making Fedora usable on limited bases. At this point the only way to get it to work properly is to re-install Fedora for which I did not have time for.
 * Even though it was an unsuccessful day, I had a privilege of learning a lot in the process.


 * Plan:
 * Will try to recreate our KeyGen issue using VirtualBox with openSuse and Fedora running. (by 2/11/14)
 * Concerns:

Week Ending February 18, 2014

 * Task:
 * Finish Brainstorming for the proposal
 * Write the proposal by Sunday(2/16) evening
 * Read information on doing experiments/running a train
 * Attempt to run an experiment on Automatix to get experience and share the information with my teammates
 * Do a little research on the best hard drive format for backing Linux files on to it

2/14
 * Results:
 * Spent an hour on finishing the brainstorming for the Systems Group portion of the proposal that I started few days ago. I expect to finish and upload the proposal on to wiki by Sunday evening. By doing so it will give the rest of the proposal group members to review it and make any necessary changes before the due day.

2/16 Unfortunately, I was unable to work on the proposal this weekend. I apologies to all my team mates in the systems group and the proposal group. I will have the proposal done sometime tomorrow(Monday, 17th).

2/17 Due to the set back from yesterday my plan for today had changed. I was planing on starting to read guides how to create an experiment and run a train using Sphinx speech recognition software. Since I needed to finish the Systems Group portion of the proposal that plan is pushed for tomorrow. Today I spent time writing the proposal and uploaded it to the Speech:Spring 2014 Proposal for review by the other Proposal Group members. I have also asked my Systems Group teammates to read it and recommend any changes and/or additions to the proposal if needed.

2/18 Today I spent time reading all about creating experiment directories and how to setup/run a train, a decoder and create a language model at the following pages:Speech:Exp, Speech:Training, Speech:Run Decode, Speech:Create LM. Creating an experiment directory and a language model is straightforward unlike running a train and decoder. Train and decoder seem to be a complex beast and will need to be done delicately in order to not cause major file and system issues. Will ask for a confirmation from the Modeling Group if the steps laid out in the above four guides are correct so that the Systems Group could start its first experiment in line of many to come, to test Fedora and openSuse performance of running Sphinx Speech Recognition.

Another task I did was search for the experiment done on Fedora that produced ""Error Percentage in 20's". First I looked at the Systems Group proposal from last year to see whose responsibility it was to do such an experiment and unfortunately there was no such information posted. Then I decided to closely search through the logs of the four members of last years Systems Group in hope that one of them would have posted about it if such an experiment was done. That search turned out to be unsuccessful as well. I was left with only one more option and that is to manually look through a hundred or so experiment logs inside Speech:Exps database. In the end no such experiment was found, this could be to a fact that it never got recorded as there are a bunch experiments missing or it could have been deleted or it was never ran. The only four recorded experiments that mention Fedora are from Eric Beikman that were done last summer. One of them was setting up the Fedora environment to be able to run a train and the other was adding words to the dictionary to be used for one of the experiments. So technically only two experiments were ran that have produced a SCLite score, Speech:Exps 0115 & Speech:Exps 0117. Experiment 0115 produced an average error percentage of 40.5 while experiment 0117 did a bit better, producing a score of 33.8. Since no experiment done on Fedora of Error-% in 20's will need to ask Prof. Jonas for permission to use Eric's experiments to run our tests with on Automatix and Rome machines.
 * Concerns:

Week Ending February 25, 2014

 * Task:
 * Research the experiment logs in hope of finding an experiment with error rate in 20's
 * Use the 20's % error experiment for a new experiment that will be used as a test method for OS war between openSuse and Fedora

2/22 There was a miscommunication between Manager Jonas and I, on which experiment to look for in order to base my Fedora and openSuse testing. I was under the impression to find an experiment that was run on Fedora with the error rate in 20's which I was unable to find. In last Capstone meeting this became clear to me that Manager Jonas was looking for an experiment that had error rate in 20's independent on which OS it was run on. So, I went back and researched experiment logs again. Not only did I find one experiment that had word error rate in 20's but four of them and on top of that there was an experiment done that produced word error rate in 10's.
 * Results:

Following Experiments fit the requirement:
 * Speech:Exps 0104 - 29.3% error
 * Speech:Exps 0108 - 27.5% error
 * Speech:Exps 0109 - 26.7% error
 * Speech:Exps 0110 - 25.2% error
 * Speech:Exps 0111 - 17.0% error

2/23 I have been busy all day so I haven't had the time to attempt to run my first experiment but I did manage to create a new experiment log Speech:Exps 0189 for my practice experiment for tomorrow (2/24). I decided to use Speech:Exps 0110 as my base for testing openSuse and Fedora performance of running Sphinx Speech Recognition Software in order to compare the two Operating Systems.

2/24 I noticed that my Speech:Exps 0189 log created last night was taken over by Forrset. I'm not sure why but I assume that he already started an experiment 0189 and didn't create the log before hand. I only created the log so no major damage was done and he was nice enough to move my log to Speech:Exps 0190.

Went to log into automatix with my user name and was prompted to enter a password which it shouldn't have since we have passwordless logins on this machine and it worked fine the other day. When I entered my password it wouldn't log in, replied with rsa key is wrong, permission denied. Before I could continue with my first experiment attempt I needed to troubleshoot these two bugs. First I reran keygen command on Caesar to rebuild my rsa keys as they might have been corrupted. Tried to log in to Automatix again and the same problem, denied. Then I decided to log in as root on Automatix and was able to do so. I checked if my user name is in the  file and it was there. I followed with changing my users password with command  and was successful in doing so. Logged off as root and attempted to log in as my user and this time I was successful and yet another error was displayed on the terminal:  This time I was sure the problem lies in the shared folder from Caesar being disconnected on Automatix. I ran command  to look at mounted file systems on the machine and sure enough there was no shared file system from Caesar. Needed to log in back as root with command  and then ran the command   which mounted the   directory from Caesar and solved all the issues encountered.

Week Ending March 4, 2014

 * Task:
 * Update my Exp 0190 log with results of the experiment
 * Run a new experiment and this time truly based on the Exp 0110

3/3 Last week I setup an experiment 0190 to run as my first test experiment. I planned on running a duplicate experiment of 0110 which was run by Eric last year and produced a best score of 25.2% error, but in the last second I decided to run a brand new experiment. I followed the guidelines on the Speech:Training page to setup the experiment. It took some time to follow all of the steps in the guide as I was trying to carefully do all the steps. In the end I was able to create an experiment 0190 directory in, I successfully updated the Sphinx Training Configuration file   to my experiment needs. Next I ran a  script to generate transcriptions and audio files for my experiment. I went with the  corpus. Then time came to generate a dictionary with the  script, this process took some time to complete. The last two steps were to generate a phone list and feats data. To generate phone list, i copied  script into my experiment folder and then ran it. This produced a  file in my   directory. To generate feats was even simpler, ran make_feats.pl script as follows. This created a  file to be used in running a train.
 * Results:

After all that, it finally came the time to run the 0190 experiment and hope that there are no errors made during the setup phase. I ran the train script. At first it seemed as it will be a clean run but then WARNING after WARNING started popping on the terminal. All of the warnings were coming from the dictionary file not having all the words in it thus the script was rejecting to run unless all the words are in a dictionary file. Having looked at the failed experiment log I found that the dictionary was missing 500 or so words. I did not have the time to implement all the missing words into the dictionary so I decided to copy a dictionary file from the 0028 experiment that the Run a Train guide is based on but that also turned to be a failure and thus making my first experiment a failure as well.

3/4 After some experience with setting up an experiment it was time to finally attempt to run a duplicate of Speech:Exps 0110 on Rome machine in order to test Fedora OS and Sphinx Speech Recognition. Right off the bat there are no instructions/guides on the wiki that deal with duplicating an experiment and running it. So, my train of thought was to copy the entire Exp/0110 directory into a new experiment directory which I named 0206. The copying took awhile but all the files successfully copied over into a new experiment directory. Next, I edited the sphinx_train.cfg file to include the new experiment #. Next, I renamed all the files starting with  into. I was a bit surprised that there were only two such files because when I was doing my test experiment from scratch there were five such files. I proceeded to run the train  for this new-old experiment. As expected I received an error which stated that it couldn't find a dictionary file, of course it couldn't find it when the experiment 0110 never had one. At this time I went back and looked at the experiment 0110 log which stated that it was based on an experiment 0107. I went into the experiment 0107 directory and inside the /etc folder I found the dictionary, phone, and fillers files that are missing from experiment 0110. I copied all three files and placed them into my experiment 0206/etc folder. I ran the train script again from my base experiment folder and this time it worked. There were number of WARNINGS that were showing but no major ERROR that would prevent the train script to fail. Well, it seemed I started my celebration a bit prematurely as the train script after five or so minutes stopped with an ERROR claiming that a file is missing to be specific it was  file. Looking inside the /trees folder there was a file with the name 0206.unpruned but not with the name suggested in the terminal 0206.unpruned/ER2-0.dtree. This is where I was forced to stop with any further testing as I didn't know how to fix such an issue.

I feel like there is a simpler way of duplicating and running an experiment without copying files from multiple previous experiments. Hopefully someone from Modeling Group will have the time to explain and show me how to do it in our next meeting.

Week Ending March 18, 2014

 * Task:
 * Contact Modeling Group about an experiment that System Group could use to do our experiments with
 * Use the suggested experiment from Modeling Group to base my new experiment on Rome machine
 * Update the group on my findings and help them run their own experiments

3/13 I emailed my group colleagues to find out where they are at the moment with their tasks. I suggested a group plan for this week that includes finishing updating the necessary wiki guides, finding a solution to our Fedora key-gen issue, fixing the previously installed backup drive on Rome, attempting to do a full Caesar backup (if time permits after Wednesdays meeting), and successfully running an experiment on Rome.
 * Results:

I have also emailed Colby Johnson in regards to an experiment that System Group could use to test Fedora OS on Rome. Unlike my previous attempt with an experiment 0110 which was missing a lot of files and train script throwing an error while running the experiment, this new experiment needs to be complete with minimal modifications required in order to run successfully.

3/16 Colby J. suggested that we at the Systems Group use Speech:Exps 0168 to do our Fedora OS experiments on Rome. Today, I created a new Exp entry in the Experiment Logs section Speech:Exps 0210 and I have setup the Exp 0210 inside  directory. Colby wasn't sure about the script that supposedly helps in duplicating an experiment, so I decided to manually copy the Exp 0168 directory into a new Exp 0210 directory. Using  command, this process as usual took a few minutes to complete due to vast number of files. Once completed I edited the Exp parameters inside  to reflect Exp 0210. Then, changed dictionary, phone, trans, filler, and fileids files prefix from 0168 to 0210. Tomorrow I will run the train on this experiment and hope it completes successfully unlike my previous to attempts.

3/17 I ran a train on the Exp 0210 that I setup yesterday. The train finished successfully or at least I think it did (there was no confirmation message). During the train process there were a lot of ERROR and WARNING messages but it didn't seem to bother the train process as it ran for about an hour and ten minuets, same time span as the Exp 0168. I'm glad that I final was able to run a train from start to the end. On to running a decoder which I have no experience with, hopefully there wont be any errors.

One concern, Exp 0168 already had a language model and it copied over to my Exp 0210, now I don't know if I need to create a new language model or run with the one already there. I'll email Colby for clarification before I continue with the Exp 0210.

3/18 First attempt at creating a new language model by running  script generated only three files compared to Exp 0168 six files. Rerunning the script generated all the files without changing anything, don't know why the first attempt failed.

Trying to run a decoder was futile as it turned out Rome was missing files in /usr/local directory. Running the run_decode2.pl script produced an error sphinx3_decode doesn't exist inside /usr/local/bin, in fact the folder was completely empty. Colby was also baffled by this since Caesar had the file at that location. At this point I decided to manually copy the Caesar's /usr/local directory over to Rome. Now when running run_decode2.pl script a new error showed in the decode.log. Now this was a confusing error considering that libs3decoder.so.0 was inside its normal /usr/local/lib directory. While looking through /mnt/main/ directory a local folder caught my eye. Files inside it looked to be the same as in Caesar's /usr/local, now it hit me that /usr/local is most likely a soft link to /mnt/main/local/ don't somehow got disconnected. I logged back in as root and removed the local folder I previously copied into Rome's /usr/ directory and ran  command, which created the soft link for /mnt/main/local.

Ofcourse didn't solve the issue of not finding libs3decoder.so.0. I knew that Eric was the one who installed Fedora on Rome by reading his logs earlier in the semester, so I went back to his log in hope of finding a solution to this issue. In less than five minutes I hit a JACKPOT. He indeed had the same issue and provided a solution to it.

Eric Beikman Solution:
 * When starting the decode script, I encountered another issue:
 * /usr/local/bin/sphinx3_decode: error while loading shared libraries: libs3decoder.so.0: cannot open shared object file: No such file or directory
 * Previously I was able to resolve this issue on rome by setting the "$LD_LIBRARY_PATH" to /usr/local/lib
 * I have a better solution now:
 * Make a file located at and called: /etc/ld.so.conf.d/sphinx.conf
 * Add /usr/local/lib to the file.
 * Execute sudo ldconfig to reload the shared libraries.

I didn't need to do first two steps since Rome already had the /etc/ld.so.conf.d/sphinx.conf file with /usr/local/lib parameter inside it. I only ran the  which solved the issue. Now I was able to run_decode2.pl script.

Colby Johnson
 * Special Thanks To:

Week Ending March 25, 2014

 * Task:
 * Look into key-gen issue on Rome
 * Look into "Broken Pipe" issue while running the Decoder on Rome from my VirtualBox System
 * Solve the backup drive issue on Rome after class on Wednesday

3/24
 * Results:
 * Summary for the last few days as I was unable to log them in during that time period.
 * 3/19
 * After having my experiment 0210 crash twice while decoding on my virtual-box system, I was able to successfully complete the decoder during the afternoon. The issue that I encountered at home wile running the decoder is the . The decoder would run fine for couple of hours but then it would crash, specifically I get booted of Rome and Caesar systems with only the "Broken Pipe" message in my terminal. As I was discussing the issue with the Modeling Group who were baffled by it as they never experienced such an issue, I decided to rerun the decoder from one of the P132 room computers. After five or so hours the decoder successfully finished its job and I was able to score it with SClite. The results were similar to the original experiment 0168.
 * 3/22
 * I looked into the Broken Pipe issue and seems that it occurs when a host or client do not send any data over the ssh pipe for a specific period of time then the connection is broken. The suggested solution is add  to the /etc/ssh/ssh_config file. I did not try out the solution yet, so I'm not sure if it will solve my Broken Pipe issue.

As for today 3/24, I spent a few hours of trying to solve the pass-wordless log in on Rome. I have tried changing bunch of parameters inside the /etc/ssh/sshd_config file, changed permissions to the ~/.ssh folder and files inside and on top of that, I disabled the firewall (I think I did) but same thing, keeps asking for a password. Reading forums that deal with setting keygen didn't provide me with any working solution. I don't know what possibly could be the problem. Comparing sshd_config file from Caesar, Automatix with Rome's seem to have identical settings.

3/25 Spent some time researching how to format a new hard drive in Linux using terminal. While using some suggested commands to check the current hard drives I found that the hard drive we installed several weeks back is in fact being recognized by the system which is weird because that day it didn't recognize the hard drive. It reads at 300 GB and that is the one we installed, I'm hoping that after are meeting tomorrow we will be able to format it as it seems to be straight forward process, nothing too fancy.

System Group had an agreement to work on the backups after the meeting on Wednesday but due to rest of the colleagues leaving I was unable to do any work on the Rome backup drive.
 * Concerns:

Week Ending April 1, 2014

 * Task:
 * Work on getting passwordless login on Rome fixed (keygen)

3/29 Systems Group successfully formatted the 300 GB hard-drive that was installed several weeks ago on Rome. This hard-drive will be used to backup our Sphinx Speech Recognition Experiments located in the /mnt/main/Exp directory.
 * Results:

I have taken Prof. Jonas's suggestion of changing my users home directory on Rome in order to test if the keygen issue might be related to the shared user directory on the system. I had edited  file to change my home directory from   to , then copied over the authorized_keys file to my new home/.ssh directory. Tried logging in from Caesar and was automatically logged in, the keygen worked. I finally have made progress on the keygen issue, it seems that there is a problem with our shared home directory. While openSuse seems to work fine with our current setup, I'm not exactly sure why Fedora is having an issue with it. Maybe it doesn't like some folder permissions in which our home directories are located in. I'll have a look at them tomorrow.

Week Ending April 8, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 15, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 22, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending April 29, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns:

Week Ending May 6, 2014

 * Task:


 * Results:


 * Plan:


 * Concerns: