HTKMLFreader question

Sep 22, 2015 at 2:11 PM
In the documentation, I get the impression that the HTKMLF reader can potentially read multiple features vectors in from a htk formatted file to construct a larger feature vector. The example in the documentation discusses a 72 dimension feature vector, but with a network designed to use 11 frames of context so the feature dimension is set to 792.

For this example, I wanted to confirm that the htk file is saved using 72 dim feature vector and the HTKMLF reader then reads in 11 consecutive frames to input one reader specified feature vector.

I intend to construct a 2D audio spectrogram array for use with a CNN and see this as the cleanest way to accomplish it.

Also, anyway to create overlap of the feature vectors, say 25% 0r 50% overlap without duplicating the data in the htk file?
Sep 22, 2015 at 6:07 PM
Yes, this is correct. If your HTK feature files are of dimension M and you want a context window of L frames, you set the feature dimension in the CNTK configuration to be M*L. So, your understanding of the way the feature reading works is correct. I'm not sure I understand the last part of the question. In HTK (or whatever feature extractor you use) you'd decide the frame size and frame shift, for example 25 ms with a 10ms shift. CNTK just packs these features vectors in whatever context window you choose. But there is no duplication of the data in the HTK file because the context window is constructed by CNTK, not HTK.
Sep 22, 2015 at 7:30 PM

In my case I’m generating a continuous spectrogram not frame by frame based spectrograms where the overlap would take place in the feature generation. I figured I would have to restructure the htk file to create this overlap arrangement unless there happened to be a way to do it with the reader.

FYI – I intend to reshape them into an image once read and apply a CNN to them which is why I was thinking of adding overlap. Is that the intention of reading in multiple frames? I guess it would also support RNN type architectures as well.

Ray Ritmiller