DeepSpeech

mirror of https://github.com/mozilla/DeepSpeech.git synced 2025-10-26 11:19:39 +00:00

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

deep-learning deepspeech embedded machine-learning neural-networks offline on-device speech-recognition speech-to-text tensorflow

Go to file

Alexandre Lissy 7b2a409f9f Converting importers from multiprocessing.dummy to multiprocessing Fixes #2817		2020-03-18 11:04:36 +01:00
.github	Add lock bot config	2018-12-28 19:37:01 -02:00
bin	Converting importers from multiprocessing.dummy to multiprocessing	2020-03-18 11:04:36 +01:00
data	Transfer-learning support	2020-02-17 08:29:10 +01:00
doc	Doc: change cuDNN dependency to 7.6	2020-02-26 11:33:59 -08:00
examples	Remove example code	2019-12-10 16:25:00 +01:00
images	Updating Geometry	2019-12-02 11:04:27 +01:00
native_client	Merge branch 'pr/2794' (Fixes PR #2794 )	2020-02-26 14:47:55 +01:00
taskcluster	SDB support	2020-03-10 10:32:58 +01:00
util	Converting importers from multiprocessing.dummy to multiprocessing	2020-03-18 11:04:36 +01:00
.cardboardlint.yml	Update cardboardlint configuration	2019-10-04 13:56:41 +02:00
.compute	Ensure properly link to TensorFlow r1.15	2020-01-22 10:50:55 +01:00
.gitattributes	Address review comments and update docs	2020-02-11 19:44:36 +01:00
.gitignore	Sphinx doc	2019-09-24 18:22:45 +02:00
.gitmodules	Use submodule for building contrib examples into docs	2019-12-10 16:25:01 +01:00
.pylintrc	Fix linter errors	2020-02-11 19:44:36 +01:00
.readthedocs.yml	Re-enable readthedocs.io	2019-09-24 10:55:26 +02:00
.taskcluster.yml	Use KVM for Android emulator	2020-02-26 19:49:02 +01:00
.travis.yml	Enforce proper line ending removal when reading alphabet	2020-03-06 15:19:56 +01:00
bazel.patch	Proper re-use of Bazel cache	2018-01-31 18:50:36 +01:00
BIBLIOGRAPHY.md	Update BIBLIOGRAPHY.md	2020-02-21 16:59:23 +00:00
build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR	Move to ARMbian Buster	2019-08-21 22:58:10 +02:00
CODE_OF_CONDUCT.md	Add Mozilla Code of Conduct file	2019-03-29 14:58:39 -07:00
CONTRIBUTING.rst	Move from Markdown to reStructuredText	2019-10-04 12:07:32 +02:00
DeepSpeech.py	Merge pull request #2779 from reuben/export-metadata	2020-03-17 12:08:35 -03:00
Dockerfile	Update Dockerfile	2020-03-07 01:33:32 +05:30
evaluate_tflite.py	Remove hardcoded constants from evaluate_tflite.py	2020-02-12 13:08:34 +01:00
evaluate.py	Remove unneeded Saver instances	2020-02-17 15:55:16 +01:00
GRAPH_VERSION	Bump graph version	2020-01-24 10:20:42 +01:00
ISSUE_TEMPLATE.md	Create an issue template	2017-11-27 16:40:59 -02:00
LICENSE	Added LICENSE	2016-09-20 19:12:29 +02:00
lm_optimizer.py	Renamed optimizer	2020-03-17 09:56:33 +01:00
README.rst	Make readthedocs link more obvious	2020-03-12 13:31:42 -03:00
RELEASE.rst	Move from Markdown to reStructuredText	2019-10-04 12:07:32 +02:00
requirements_eval_tflite.txt	Update evaluate_tflite requirements	2020-01-12 11:02:15 +01:00
requirements_tests.txt	Converting importers from multiprocessing.dummy to multiprocessing	2020-03-18 11:04:36 +01:00
requirements_transcribe.txt	Make webrtcvad really optional	2020-02-24 12:08:12 +01:00
requirements.txt	Merge pull request #2783 from mozilla/optuna	2020-03-17 09:58:01 +01:00
stats.py	Introducing utils.helpers for miscellaneous helper functions	2020-01-14 16:04:18 +01:00
SUPPORT.rst	Point people to Matrix room instead of IRC	2020-02-11 17:44:44 +01:00
transcribe.py	Fix transcribe.py - use new checkpoint load method	2020-02-21 15:56:58 +00:00
VERSION	Bump VERSION to 0.7.0-alpha.2	2020-02-17 12:12:57 +01:00

README.rst

Project DeepSpeech
==================


.. image:: https://readthedocs.org/projects/deepspeech/badge/?version=latest
   :target: http://deepspeech.readthedocs.io/?badge=latest
   :alt: Documentation


.. image:: https://community-tc.services.mozilla.com/api/github/v1/repository/mozilla/DeepSpeech/master/badge.svg
   :target: https://community-tc.services.mozilla.com/api/github/v1/repository/mozilla/DeepSpeech/master/latest
   :alt: Task Status


DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on `Baidu's Deep Speech research paper <https://arxiv.org/abs/1412.5567>`_. Project DeepSpeech uses Google's `TensorFlow <https://www.tensorflow.org/>`_ to make the implementation easier.

**NOTE:** This documentation applies to the **MASTER version** of DeepSpeech only. **Documentation for the latest stable version** is published on `deepspeech.readthedocs.io <http://deepspeech.readthedocs.io/?badge=latest>`_.

To install and use deepspeech all you have to do is:

.. code-block:: bash

   # Create and activate a virtualenv
   virtualenv -p python3 $HOME/tmp/deepspeech-venv/
   source $HOME/tmp/deepspeech-venv/bin/activate

   # Install DeepSpeech
   pip3 install deepspeech

   # Download pre-trained English model and extract
   curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/deepspeech-0.6.1-models.tar.gz
   tar xvf deepspeech-0.6.1-models.tar.gz

   # Download example audio files
   curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/audio-0.6.1.tar.gz
   tar xvf audio-0.6.1.tar.gz

   # Transcribe an audio file
   deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --scorer deepspeech-0.6.1-models/kenlm.scorer --audio audio/2830-3980-0043.wav

A pre-trained English model is available for use and can be downloaded using `the instructions below <doc/USING.rst#using-a-pre-trained-model>`_. A package with some example audio files is available for download in our `release notes <https://github.com/mozilla/DeepSpeech/releases/latest>`_.

Quicker inference can be performed using a supported NVIDIA GPU on Linux. See the `release notes <https://github.com/mozilla/DeepSpeech/releases/latest>`_ to find which GPUs are supported. To run ``deepspeech`` on a GPU, install the GPU specific package:

.. code-block:: bash

   # Create and activate a virtualenv
   virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
   source $HOME/tmp/deepspeech-gpu-venv/bin/activate

   # Install DeepSpeech CUDA enabled package
   pip3 install deepspeech-gpu

   # Transcribe an audio file.
   deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --scorer deepspeech-0.6.1-models/kenlm.scorer --audio audio/2830-3980-0043.wav

Please ensure you have the required `CUDA dependencies <doc/USING.rst#cuda-dependency>`_.

See the output of ``deepspeech -h`` for more information on the use of ``deepspeech``. (If you experience problems running ``deepspeech``\ , please check `required runtime dependencies <native_client/README.rst#required-dependencies>`_\ ).

----

**Table of Contents**
  
* `Using a Pre-trained Model <doc/USING.rst#using-a-pre-trained-model>`_

  * `CUDA dependency <doc/USING.rst#cuda-dependency>`_
  * `Getting the pre-trained model <doc/USING.rst#getting-the-pre-trained-model>`_
  * `Model compatibility <doc/USING.rst#model-compatibility>`_
  * `Using the Python package <doc/USING.rst#using-the-python-package>`_
  * `Using the Node.JS package <doc/USING.rst#using-the-nodejs-package>`_
  * `Using the Command Line client <doc/USING.rst#using-the-command-line-client>`_
  * `Installing bindings from source <doc/USING.rst#installing-bindings-from-source>`_
  * `Third party bindings <doc/USING.rst#third-party-bindings>`_


* `Trying out DeepSpeech with examples <examples/README.rst>`_

* `Training your own Model <doc/TRAINING.rst#training-your-own-model>`_

  * `Prerequisites for training a model <doc/TRAINING.rst#prerequisites-for-training-a-model>`_
  * `Getting the training code <doc/TRAINING.rst#getting-the-training-code>`_
  * `Installing Python dependencies <doc/TRAINING.rst#installing-python-dependencies>`_
  * `Recommendations <doc/TRAINING.rst#recommendations>`_
  * `Common Voice training data <doc/TRAINING.rst#common-voice-training-data>`_
  * `Training a model <doc/TRAINING.rst#training-a-model>`_
  * `Checkpointing <doc/TRAINING.rst#checkpointing>`_
  * `Exporting a model for inference <doc/TRAINING.rst#exporting-a-model-for-inference>`_
  * `Exporting a model for TFLite <doc/TRAINING.rst#exporting-a-model-for-tflite>`_
  * `Making a mmap-able model for inference <doc/TRAINING.rst#making-a-mmap-able-model-for-inference>`_
  * `Continuing training from a release model <doc/TRAINING.rst#continuing-training-from-a-release-model>`_
  * `Training with Augmentation <doc/TRAINING.rst#training-with-augmentation>`_

* `Contribution guidelines <CONTRIBUTING.rst>`_
* `Contact/Getting Help <SUPPORT.rst>`_