First off some facts:
- simon does support importing PLS dictionaries
- simon does not support any explicit export functionalities what-so-ever. There are no export functions for the training data, the lexicon, vocabulary or anything.
- simon does not support the import of training data based on a supplied prompts file - be that in plain text or XML.
None of those missing features are due to idelogical reasons but mostly due to time constraints. However, I am not as convinced as ralfherzog that they are that essential.
As far as I know, simon is the only application using PLS dictionaries so an export functionality is a low priority feature. The same goes for the training data. An integration with voxforge is planned for the future which would in my opinion be the only practical use case for export features right now anyways.
Some might wonder why we don't use PLS as the default dictionary format in the first place but the answer is very simple. The PLS standard does not allow for any terminal information to be stored with the dictionary. The current storage format is a standard Julius vocabulary file and an accompanying HTK dictionary. Those are the respective file formats of the underlying components and as they are not (yet) exchangeable I see no reason to introduce new file formats.
The import of training data is something that is included in simon 0.2 but only in a very basic form. Its current state is usable if you have training data gathered by a previous simon installation. However, everything else is not yet supported. I would personally like to see importing of a "normal" HTK prompts file but don't see the advantage in SSML. SSML is not designed for that paticular usage and just introduces unnescessairy overhead. Yes, content validation is a nice thing that makes XML a very good choice for many, many things but prompts are imho not one of them. So maybe we might see a import function for SSML formated prompts for data that is already gathered and stored in that format but making it the primary storage format of prompts in simon is probably not going to happen anytime soon. Its the same as with PLS: HTK expects the prompts in that format so why introduce an additional source of errors by introducing another conversion step?
Keine Kommentare:
Kommentar veröffentlichen