DeepSpeech

mirror of https://github.com/mozilla/DeepSpeech.git synced 2025-10-26 11:19:39 +00:00

History

Reuben Morais 65915c7f57 Address review comments		2020-07-02 14:09:42 +02:00
..
lm	Rewrite data/lm/generate_package.py into native_client/generate_scorer_package.cpp	2020-06-30 09:52:44 +02:00
smoke_test	Transfer-learning support	2020-02-17 08:29:10 +01:00
ted	Merge of pull requests #49 , #50 , and #52 . Fixes issues #2 , #4 , #11 , #12 , #46 , #47 , and #48	2016-10-13 15:15:39 -04:00
alphabet.txt	Support custom alphabet mappings (Fixes #692 ) (#797 )	2017-08-31 11:51:15 +02:00
README.rst	Address review comments	2020-07-02 14:09:42 +02:00

README.rst

Language-Specific Data
======================

This directory contains language-specific data files. Most importantly, you will find here:

1. A list of unique characters for the target language (e.g. English) in ``data/alphabet.txt``. After installing the training code, you can check ``python -m deepspeech_training.util.check_characters --help`` for a tool that creates an alphabet file from a list of training CSV files.

2. A scorer package (``data/lm/kenlm.scorer``) generated with ``generate_scorer_package`` (``native_client/generate_scorer_package.cpp``). The scorer package includes a binary n-gram language model generated with ``data/lm/generate_lm.py``.

For more information on how to build these resources from scratch, see the ``External scorer scripts`` section on `deepspeech.readthedocs.io <https://deepspeech.readthedocs.io/>`_.