Commit Graph

30 Commits

Author SHA1 Message Date
Reuben Morais
699e4ebcd7 Revert to a pipelined approach for test epochs to avoid CPU OOM with large alphabets 2019-05-13 23:49:14 -03:00
Reuben Morais
13757a4258 Fix pylint warnings 2019-04-11 07:02:21 -03:00
Tilman Kamp
42f04dc9aa Fix #1972 2019-03-21 13:39:12 +01:00
Tilman Kamp
6c6a4e08ca Fix #1962 2019-03-18 18:53:04 +01:00
josh
32bf1a685a prettier error message to the terminal when alphabets are mis-matched. 2019-02-12 23:10:46 +01:00
Reuben Morais
12c62756c7 Switch wer_cer_batch to compute real CER over corpus 2019-02-05 09:29:47 -02:00
Reuben Morais
7a14bcc4de Clean up and split TensorFlow deps of text.py 2019-02-04 08:35:43 -02:00
Josh Meyer
76e81f34ff
Merge pull request #1715 from JRMeyer/check-alphabet
Check for transcript & alphabet mis-match
2018-11-09 09:31:42 -08:00
josh
3c274d4f62 reubens comments, kept newlines because prettier 2018-11-08 14:16:41 -08:00
josh
5a8f88d922 added error message to text.py when the transcripts and alphabet.txt file dont match, and a file to find the unique character set from the {train/dev/test} csv files 2018-11-07 10:04:22 -08:00
Reuben Morais
bb4551caa9 Extend Python Alphabet with config file and decode method 2018-11-02 14:00:11 -03:00
Reuben Morais
9c338b76db Fix handling of Unicode messages when using custom alphabets (Fixes #849) 2017-09-26 14:26:41 -03:00
Reuben Morais
f3ea690b38 Make sure custom alphabet code works properly on Python 2 (#806) 2017-09-01 11:21:01 +02:00
Reuben Morais
1c4cbf1813 Support custom alphabet mappings (Fixes #692) (#797)
Support custom alphabet mappings
2017-08-31 11:51:15 +02:00
Reuben Morais
23fa1f71a5 Don't duplicate spaces in the source text when converting to integer labels 2017-08-02 17:00:30 -03:00
Reuben Morais
2da9e7849f Process corpora in a single pass when possible, and save their definition in CSV files for performance and cleaner code. 2017-04-14 17:21:48 -03:00
Mike C. Fletcher
bae4989660 Further fixes to get the initial test run to finish 2017-03-27 00:53:23 -04:00
Alexandre Lissy
3838c0a9ce Upgrade to run on Tensorflow 1.0.0 2017-02-22 14:48:47 +01:00
Reuben Morais
d6fb444287 Convert code comments to Sphinx RST docstrings 2017-02-02 23:42:36 -02:00
Andre Natal
32a436309e Switchboard importer 2017-01-03 12:17:02 -08:00
Reuben Morais
33c9521a6f Add validation and cleanup function to util/text.py 2016-12-21 14:37:21 -02:00
Kelly Davis
0707ad89d2 Merge branch 'master' into issue109_inputops 2016-11-09 12:26:18 +01:00
Reuben Morais
d989e8de09 Make sure the initializer passed to tf.scan doesn't break the API contract
We need to make sure the initializer shape matches the return value
of the callable passed to tf.scan.

This also adds an assertion on the shape of labels and the values
in label_lengths that enforces a condition that is needed for
ctc_label_dense_to_sparse to work.
2016-11-08 12:19:42 -02:00
Chris Lord
6178c31a20 Write a Tensorflow Serving client 2016-11-08 11:45:28 +01:00
Reuben Morais
182e20187a Switch importers to new input pipeline 2016-11-08 02:35:50 -02:00
Reuben Morais
ca98c5aab8 Expose text_to_char_array in util/text.py 2016-11-07 16:12:58 -02:00
Tilman Kamp
0849efd6da Fix #111; documented and revisited WER calculation 2016-11-01 16:11:04 +01:00
Kelly Davis
43303a2199 Fix #67
WER is calculated using Levenshtein distance on chars, not words
2016-10-17 12:48:20 -04:00
Kelly Davis
f8c0b57578 Fixed typo in wers 2016-10-17 08:47:27 -04:00
Kelly Davis
a3abc9d92a Merge of pull requests #49, #50, and #52. Fixes issues #2, #4, #11, #12, #46, #47, and #48 2016-10-13 15:15:39 -04:00