mirror of
https://github.com/mozilla/DeepSpeech.git
synced 2025-10-26 11:19:39 +00:00
182 lines
9.0 KiB
ReStructuredText
182 lines
9.0 KiB
ReStructuredText
Using a Pre-trained Model
|
|
=========================
|
|
|
|
Inference using a DeepSpeech pre-trained model can be done with a client/language binding package. We have four clients/language bindings in this repository, listed below, and also a few community-maintained clients/language bindings in other repositories, listed `further down in this README <#third-party-bindings>`_.
|
|
|
|
|
|
* `The Python package/language binding <#using-the-python-package>`_
|
|
* `The Node.JS package/language binding <#using-the-nodejs-package>`_
|
|
* `The Command-Line client <#using-the-command-line-client>`_
|
|
* `The .NET client/language binding <native_client/dotnet/README.md>`_
|
|
|
|
Running ``deepspeech`` might, see below, require some runtime dependencies to be already installed on your system:
|
|
|
|
|
|
* sox - The Python and Node.JS clients use SoX to resample files to 16kHz.
|
|
* libgomp1 - libsox (statically linked into the clients) depends on OpenMP. Some people have had to install this manually.
|
|
* libstdc++ - Standard C++ Library implementation. Some people have had to install this manually.
|
|
* libpthread - On Linux, some people have had to install libpthread manually.
|
|
|
|
Please refer to your system's documentation on how to install these dependencies.
|
|
|
|
CUDA dependency
|
|
^^^^^^^^^^^^^^^
|
|
|
|
The GPU capable builds (Python, NodeJS, C++, etc) depend on the same CUDA runtime as upstream TensorFlow. Currently with TensorFlow 1.14 it depends on CUDA 10.0 and CuDNN v7.5. `See the TensorFlow documentation <https://www.tensorflow.org/install/gpu>`_.
|
|
|
|
Getting the pre-trained model
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the DeepSpeech `releases page <https://github.com/mozilla/DeepSpeech/releases>`_. Alternatively, you can run the following command to download and unzip the model files in your current directory:
|
|
|
|
.. code-block:: bash
|
|
|
|
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.5.1/deepspeech-0.5.1-models.tar.gz
|
|
tar xvfz deepspeech-0.5.1-models.tar.gz
|
|
|
|
Model compatibility
|
|
^^^^^^^^^^^^^^^^^^^
|
|
|
|
DeepSpeech models are versioned to keep you from trying to use an incompatible graph with a newer client after a breaking change was made to the code. If you get an error saying your model file version is too old for the client, you should either upgrade to a newer model release, re-export your model from the checkpoint using a newer version of the code, or downgrade your client if you need to use the old model and can't re-export it.
|
|
|
|
Using the Python package
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Pre-built binaries which can be used for performing inference with a trained model can be installed with ``pip3``. You can then use the ``deepspeech`` binary to do speech-to-text on an audio file:
|
|
|
|
For the Python bindings, it is highly recommended that you perform the installation within a Python 3.5 or later virtual environment. You can find more information about those in `this documentation <http://docs.python-guide.org/en/latest/dev/virtualenvs/>`_.
|
|
|
|
We will continue under the assumption that you already have your system properly setup to create new virtual environments.
|
|
|
|
Create a DeepSpeech virtual environment
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In creating a virtual environment you will create a directory containing a ``python3`` binary and everything needed to run deepspeech. You can use whatever directory you want. For the purpose of the documentation, we will rely on ``$HOME/tmp/deepspeech-venv``. You can create it using this command:
|
|
|
|
.. code-block::
|
|
|
|
$ virtualenv -p python3 $HOME/tmp/deepspeech-venv/
|
|
|
|
Once this command completes successfully, the environment will be ready to be activated.
|
|
|
|
Activating the environment
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Each time you need to work with DeepSpeech, you have to *activate* this virtual environment. This is done with this simple command:
|
|
|
|
.. code-block::
|
|
|
|
$ source $HOME/tmp/deepspeech-venv/bin/activate
|
|
|
|
Installing DeepSpeech Python bindings
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Once your environment has been set-up and loaded, you can use ``pip3`` to manage packages locally. On a fresh setup of the ``virtualenv``\ , you will have to install the DeepSpeech wheel. You can check if ``deepspeech`` is already installed with ``pip3 list``.
|
|
|
|
To perform the installation, just use ``pip3`` as such:
|
|
|
|
.. code-block::
|
|
|
|
$ pip3 install deepspeech
|
|
|
|
If ``deepspeech`` is already installed, you can update it as such:
|
|
|
|
.. code-block::
|
|
|
|
$ pip3 install --upgrade deepspeech
|
|
|
|
Alternatively, if you have a supported NVIDIA GPU on Linux, you can install the GPU specific package as follows:
|
|
|
|
.. code-block::
|
|
|
|
$ pip3 install deepspeech-gpu
|
|
|
|
See the `release notes <https://github.com/mozilla/DeepSpeech/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
|
|
|
|
You can update ``deepspeech-gpu`` as follows:
|
|
|
|
.. code-block::
|
|
|
|
$ pip3 install --upgrade deepspeech-gpu
|
|
|
|
In both cases, ``pip3`` should take care of installing all the required dependencies. After installation has finished, you should be able to call ``deepspeech`` from the command-line.
|
|
|
|
Note: the following command assumes you `downloaded the pre-trained model <#getting-the-pre-trained-model>`_.
|
|
|
|
.. code-block:: bash
|
|
|
|
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav
|
|
|
|
The arguments ``--lm`` and ``--trie`` are optional, and represent a language model.
|
|
|
|
See `client.py <native_client/python/client.py>`_ for an example of how to use the package programatically.
|
|
|
|
Using the Node.JS package
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
You can download the Node.JS bindings using ``npm``\ :
|
|
|
|
.. code-block:: bash
|
|
|
|
npm install deepspeech
|
|
|
|
Please note that as of now, we only support Node.JS versions 4, 5 and 6. Once `SWIG has support <https://github.com/swig/swig/pull/968>`_ we can build for newer versions.
|
|
|
|
Alternatively, if you're using Linux and have a supported NVIDIA GPU, you can install the GPU specific package as follows:
|
|
|
|
.. code-block:: bash
|
|
|
|
npm install deepspeech-gpu
|
|
|
|
See the `release notes <https://github.com/mozilla/DeepSpeech/releases>`_ to find which GPUs are supported. Please ensure you have the required `CUDA dependency <#cuda-dependency>`_.
|
|
|
|
See `client.js <native_client/javascript/client.js>`_ for an example of how to use the bindings. Or download the `wav example <examples/nodejs_wav>`_.
|
|
|
|
Using the Command-Line client
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
To download the pre-built binaries for the ``deepspeech`` command-line (compiled C++) client, use ``util/taskcluster.py``\ :
|
|
|
|
.. code-block:: bash
|
|
|
|
python3 util/taskcluster.py --target .
|
|
|
|
or if you're on macOS:
|
|
|
|
.. code-block:: bash
|
|
|
|
python3 util/taskcluster.py --arch osx --target .
|
|
|
|
also, if you need some binaries different than current master, like ``v0.2.0-alpha.6``\ , you can use ``--branch``\ :
|
|
|
|
.. code-block:: bash
|
|
|
|
python3 util/taskcluster.py --branch "v0.2.0-alpha.6" --target "."
|
|
|
|
The script ``taskcluster.py`` will download ``native_client.tar.xz`` (which includes the ``deepspeech`` binary, ``generate_trie`` and associated libraries) and extract it into the current folder. Also, ``taskcluster.py`` will download binaries for Linux/x86_64 by default, but you can override that behavior with the ``--arch`` parameter. See the help info with ``python util/taskcluster.py -h`` for more details. Specific branches of DeepSpeech or TensorFlow can be specified as well.
|
|
|
|
Note: the following command assumes you `downloaded the pre-trained model <#getting-the-pre-trained-model>`_.
|
|
|
|
.. code-block:: bash
|
|
|
|
./deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio audio_input.wav
|
|
|
|
See the help output with ``./deepspeech -h`` and the `native client README <native_client/README.md>`_ for more details.
|
|
|
|
Installing bindings from source
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
If pre-built binaries aren't available for your system, you'll need to install them from scratch. Follow these `\ ``native_client`` installation instructions <native_client/README.md>`_.
|
|
|
|
Third party bindings
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
In addition to the bindings above, third party developers have started to provide bindings to other languages:
|
|
|
|
|
|
* `Asticode <https://github.com/asticode>`_ provides `Golang <https://golang.org>`_ bindings in its `go-astideepspeech <https://github.com/asticode/go-astideepspeech>`_ repo.
|
|
* `RustAudio <https://github.com/RustAudio>`_ provide a `Rust <https://www.rust-lang.org>`_ binding, the installation and use of which is described in their `deepspeech-rs <https://github.com/RustAudio/deepspeech-rs>`_ repo.
|
|
* `stes <https://github.com/stes>`_ provides preliminary `PKGBUILDs <https://wiki.archlinux.org/index.php/PKGBUILD>`_ to install the client and python bindings on `Arch Linux <https://www.archlinux.org/>`_ in the `arch-deepspeech <https://github.com/stes/arch-deepspeech>`_ repo.
|
|
* `gst-deepspeech <https://github.com/Elleo/gst-deepspeech>`_ provides a `GStreamer <https://gstreamer.freedesktop.org/>`_ plugin which can be used from any language with GStreamer bindings.
|
|
|