mirror of https://github.com/mozilla/DeepSpeech.git synced 2025-10-26 11:19:39 +00:00

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

deep-learning deepspeech embedded machine-learning neural-networks offline on-device speech-recognition speech-to-text tensorflow

Go to file

Alexandre Lissy 34848dcda7 Switch dependency from xdg to pyxdg Fixes #129		2016-11-07 15:27:01 +01:00
bin	Fix TED LIUM automation path	2016-10-26 15:53:40 +02:00
data	Merge of pull requests #49 , #50 , and #52 . Fixes issues #2 , #4 , #11 , #12 , #46 , #47 , and #48	2016-10-13 15:15:39 -04:00
images	Fix of issue #23	2016-09-26 10:26:15 +02:00
resources	Fix #13 ; Fix #14 ; Reporting of WERs by index.htm	2016-10-14 18:34:05 +02:00
util	Switch dependency from xdg to pyxdg	2016-11-07 15:27:01 +01:00
.gitignore	Fixed #92	2016-10-26 11:12:26 +02:00
DeepSpeech.ipynb	Expose FULL_TRACE logging capability	2016-11-07 11:00:55 +01:00
index.htm	Fix #115 ; Time based charts now with timeline on the bottom	2016-11-04 12:23:59 +01:00
LICENSE	Added LICENSE	2016-09-20 19:12:29 +02:00
README.website.md	Switch dependency from xdg to pyxdg	2016-11-07 15:27:01 +01:00

README.website.md

Overview of the process for publishing WER

The tracking of WER is made using the following workflow:

a dedicated user on the learning machine periodically runs training jobs (cron job, or manual runs)
this produces, mostly, js/hyper.js containig a concatenated version of all previous runs
util/website.py contains code that will connect to a SSH server, using SFTP
this will publish 'index.html' and its dependencies

Setup of the dedicated user:

Create a standard user
Either rely on system's tensorflow or populate a virtualenv
Using system tensorflow or a virtualenv might require setting the PYTHONPATH env variable (done for system wide tensorflow installation in the example below).
Install PIP dependencies:
jupyter
BeautifulSoup4
GitPython
pysftp
pyxdg
requests
Construct cron job:

SHELL=/bin/bash
PATH=/usr/local/bin:/usr/bin/:/bin
# Run WER every 15 mins
*/5 *  *   *   *    (mkdir -p $HOME/wer && cd $HOME/wer && source /usr/local/tensorflow-env/bin/activate && /usr/bin/curl -H "Cache-Control: no-cache" -L https://raw.githubusercontent.com/mozilla/DeepSpeech/website/util/automation.py | ds_website_username="u" ds_website_privkey="$HOME/.ssh/k" ds_website_server_fqdn="host.tld" ds_website_server_root="www/" ds_wer_automation="./bin/run-wer-automation.sh" python ; cd) 2>$HOME/.deepspeech_wer.err.log 1>$HOME/.deepspeech_wer.out.log

Cron task will take care of:
checking if any there were any new merges
perform a clone of the git repo and checkout those merges
schedule sequential execution against those merges
notebook is configured to automatically perform merging and upload if the proper environment variables are configured, effectively updating the website on each iteration from the above process
saving of the hyper.json files produced
wiping the cloned git repo
A 'lock' file will be created in ~/.cache/deepspeech_wer/ to ensure we do not trigger multiple execution at the same time. Unexpected exception might leave a stale lock file
A 'last_sha1' in the same directory will be used to keep track of what has been done last
Previous runs' logs will be saved to ~/.local/share/deepspeech_wer/
For debugging purpose, ~/.deepspeech_wer.err.log and ~/.deepspeech_wer.out.log will collect stderr/stdout
Expose those environment variable (please refer to util/website.py to have more details on each) (cron above does it):
ds_website_username
ds_website_privkey
ds_website_server_fqdn
ds_website_server_port
ds_website_server_root

Setup of web-facing server:

Ensure existing webroot
Generate a SSH key, and upload public key to web-facing server
Connect at least one time manually from the training machine to the web-facing server to accept the server host key and populate known_hosts file (pay attention to the FQDN)
Make sure that server is configured with proper DirectoryIndex (Apache, or equivalent directive for others), whether system-wide or locally (with a .htaccess for example).
Bootstrap with empty index.htm (and populate .htaccess if needed)
That should be all. Upon any big changes with the HTML codebase, make sure to cleanup the mess.