Discriminative Pre-training with large datasets

Jul 24, 2015 at 8:27 AM
I need to train a DNN on TED-LIUM version 1 dataset (118h) with Discriminative Pre-training (DPT). The DNN contains 6 hidden layers with 2048 hidden units per layer. I don't want to train each layer till the convergence since it is very time consuming. Previously, I used to trained the networks using RBM pre-training.

I need some tips on selecting a suitable number of epochs for each layer. Please advise.