DeepSpeech

mirror of https://github.com/mozilla/DeepSpeech.git synced 2025-10-26 11:19:39 +00:00

Author	SHA1	Message	Date
Reuben Morais	a84abf813c	Deduplicate Alphabet implementations, use C++ one everywhere	2020-06-30 09:52:45 +02:00
Reuben Morais	33760a6bcd	Delay beam expansion until a non-blank label has probability >0.1%	2020-04-17 14:33:40 +02:00
Reuben Morais	5fa1839a7f	Use Alphabet to compute string values in get_prev_*	2020-04-13 10:23:23 +02:00
Reuben Morais	c1b1a59423	Score prefix as soon as a grapheme is formed rather than 1 byte later	2019-11-11 12:52:48 +01:00
Reuben Morais	f4cdd988df	UTF-8 target	2019-11-11 11:36:16 +01:00
Alexandre Lissy	ef3f8004ce	Use std::shared_ptr instead of raw pointer for dictionary_ Fixes #2403	2019-10-18 10:15:59 +02:00
Reuben Morais	31d81740ee	Add debugging helpers to PathTrie	2019-10-15 12:49:38 +02:00
Reuben Morais	e3bf5d3cc6	Only update time step of leaf prefixes The intention of this check is to improve the accuracy of the timings by recording the time step where the character saw its highest probability rather than the first time step where it was seen. The problem happens when updating the time step of a prefix that already has children. In that case, if any of the children have a time step that is earlier than `new_timestep`, it'll break the linearity of the timings. My fix is to simply check that the prefix we're updating is a leaf. For example, say during decoding we have the following beams (format is `(char \| time)`, tree node id below, nodes with same id are the same object): ``` 1. (-1 \| 0 ) -> ('s' \| 10) -> ('h' \| 13) -> ('e' \| 14) A B C D 2. (-1 \| 0 ) -> ('s' \| 10) -> ('h' \| 14) A B E ``` And the prefix list is [B, C, D, E]. Currently, if we process character 'h' in time step 15 with a probability higher than both C and E, we update both nodes to have time step 15, which breaks linearity in beam 1. With my fix, we only update node E, which is a leaf. In my tests this does fix the problem, but since we don't have any known good quality data to verify against, it's hard to know if it has other side effects.	2019-08-20 12:03:59 +02:00
Reuben Morais	31afc6811f	Switch to ConstFst from VectorFst and mmap trie file when reading	2019-06-21 22:21:19 -03:00
Reuben Morais	5bf6e63f1b	Record character timing offset at peak probability of character Uplifted from `173dddbe32`	2019-02-17 22:41:18 -03:00
Reuben Morais	9e8c208be2	Don't save/load trie for character based LMs	2019-01-07 08:49:37 -02:00
Reuben Morais	c34fc5b7ac	Import parlance/ctcdecode into repository	2018-10-25 17:00:48 -03:00

12 Commits