Commit Graph

29 Commits

Author SHA1 Message Date
Josh Meyer
73ebc50277
capitalize 2019-03-18 23:54:42 +01:00
Josh Meyer
c3ce06ede9
generate_trie, not kenlm 2019-03-18 23:53:28 +01:00
josh
715b8c52ec short readme for data dir 2019-02-28 01:42:16 +01:00
Reuben Morais
60fb5ad04c Remove old versions of decoder binary files 2018-11-08 18:35:42 -02:00
Reuben Morais
70ff71c4c5 Switch native tests to ctcdecode trie and prodmodel 2018-10-25 17:38:22 -03:00
Reuben Morais
3cc9b3711d Use ctcdecode in native client 2018-10-25 17:01:08 -03:00
Alexandre Lissy
7d2c0e27f7 Update trie file to v2 2018-10-02 16:21:56 +02:00
Alexandre Lissy
7c5e031b1f Change TRIE format 2018-10-01 19:03:36 +02:00
Reuben Morais
cb86e7e191 Update language model to a trie-based LM created from the LibriSpeech LM corpus 2018-09-17 11:11:20 -03:00
kdavis-mozilla
a96954b4a9 Quick #1485 fix. For now added @samgd 4-gram LM built from LibriSpeech train-* 2018-08-10 17:17:25 +02:00
kdavis-mozilla
5fb130012a Quick #1485 fix, for now added OpenSLR SLR11 3-gram 2018-08-09 23:47:12 +02:00
Alexandre Lissy
3f2a520941 Revert "Added quntized array language model and trie"
This reverts commit 754ecd831b.
2018-07-23 11:29:27 +02:00
Alexandre Lissy
650f7f851d Revert "Fixes #1236 (Switch KenLM to trie based language model)"
This reverts commit e34c52fcb9.
2018-07-20 18:56:57 +02:00
kdavis-mozilla
754ecd831b Added quntized array language model and trie 2018-06-01 20:34:05 +02:00
kdavis-mozilla
e34c52fcb9 Fixes #1236 (Switch KenLM to trie based language model) 2018-06-01 17:15:25 +02:00
Alexandre Lissy
d7653d749b Use LDC93S1 small language model
How to regenerate:
 - Get a KenLM build
 - $ (tr -d '[:digit:]|[:punct:]' < LDC93S1.txt | tr '[:upper:]' '[:lower:]'; head -n 500 ../lm/vocab.txt) > vocab.txt
 - $ bin/lmplz --prune 0 0 1 --order 5 --text vocab.txt --arpa vocab.pruned.arpa
 - $ bin/build_binary  -s vocab.pruned.arpa vocab.pruned.lm
 - $ generate_trie ../alphabet.txt vocab.pruned.lm vocab.txt vocab.trie

Fixes #1245
2018-02-15 14:56:02 +01:00
Alexandre Lissy
72298b8f6d Add different samplerate samples
Fixes #1022
2018-02-08 20:48:53 +01:00
Kelly Davis
8886afc81e Revert "Added language model with apostrophe"
This reverts commit c5db7d1f71.
2017-11-17 13:41:12 +01:00
Kelly Davis
c5db7d1f71 Added language model with apostrophe 2017-11-13 15:35:10 +01:00
Reuben Morais
4a10d400da Save alphabet size in trie file and check it when loading 2017-11-01 19:58:57 -02:00
Reuben Morais
1f3d26ddda Address review comments 2017-09-13 11:42:41 -03:00
Reuben Morais
2cccd33452 Remove current re-scoring of decoder output and switch to custom op 2017-09-13 11:41:15 -03:00
Reuben Morais
1c4cbf1813 Support custom alphabet mappings (Fixes #692) (#797)
Support custom alphabet mappings
2017-08-31 11:51:15 +02:00
Kelly Davis
167d0acd99 Fixed #620 2017-06-05 21:47:36 +02:00
Kelly Davis
4765953b99 Fixed #412 2017-03-07 15:05:49 +01:00
Kelly Davis
9e266f2fc3 Temp fix of #8 until tensorflow/tensorflow#6034 is fixed 2017-02-20 07:10:51 +01:00
Kelly Davis
86d6ef08a8 Added Kneser-Ney, 4-gram, 30k word LM 2017-02-20 07:10:49 +01:00
Kelly Davis
a3abc9d92a Merge of pull requests #49, #50, and #52. Fixes issues #2, #4, #11, #12, #46, #47, and #48 2016-10-13 15:15:39 -04:00
Kelly Davis
9eebe98aa9 Adding CTC to notebook 2016-09-19 06:11:42 +02:00