Convolutional neural network for speech recognition

Jan 16, 2015 at 5:55 PM
Is there anyway to train a CNN-based acoustic model in speech recognition using NDL, similar with what we do to train a DNN-based acoustic model?
Coordinator
Jan 16, 2015 at 6:02 PM
Yes. You can find the NDL Convolution command in the document.
Jan 16, 2015 at 10:36 PM
Edited Jan 16, 2015 at 10:37 PM
Thanks. The NDL Convolution commnad is designed for image regarding to the document. Does that means we can take the frames of speech features as images?
Coordinator
Jan 16, 2015 at 10:53 PM
Yes. Actually it is an image.
Jan 20, 2015 at 8:14 PM
Thanks a lot! I have tried the Convolution command successfully. However, it seems that there is no bias vector for the convolution filter, acting as "shared biases", and we have to use a "global" bias vector on top of the pooling layer instead. Is that true?
Coordinator
Jan 20, 2015 at 8:42 PM
No, you can add bias (use a plus node) to any nodes, including the output of the convolution. But typically people add it on top of the pooling layer. Also, the single column biases are shared if the number of rows of the input column is a multiple of the number of rows of the bias.
Jan 20, 2015 at 10:48 PM
Thanks a lot! Here are some other questions: Could we do limited weight sharing for CNN using CNTK? Is there an easy way to handle frequency-independent features (e.g. log energy) for CNN using CNTK?
Coordinator
Jan 27, 2015 at 1:02 AM
We probably need a Reshape node to do limited weight sharing. With reshape, you can use RowSlice to extract the bands you want and apply CNN only upon that slice.