Question: how to read overlapped blocks of speech data frames?

Jan 27, 2015 at 7:45 PM
In Demos/Speech/TrainSimpleTimit.config the HTKMLFReader dim parameter is 792. This means, I think, that each vector of input data presented to the net is the concatenation of 11 10-ms frames of 72 data values in the zda files (from HTKCopy).

Is there a way to specify that the next vector of data presented to the net should consist of 11 frames that partially overlap the last one? For example an overlap of 7 frames would mean that a step of 4 frames (40 ms.) would occur from one vector to the next. This would result in more training data that might more fully exploit the data. In this case, how would one synchronize the phone labels in TimitLabels.mlf with the overlapped frames?

Thanks,
Richard
Coordinator
Jan 27, 2015 at 8:18 PM
Currently, it always moves 1 frame (i.e., 10 frames of 11 are overlapped).