DNN Training resulting in NaNs for ReLUs + Dropout

May 18, 2015 at 4:01 AM
I am trying to train a DNN (7 hidden layers) with ReLUs + Dropouts. Below is the configuration for the DNN (criteria is CrossEntropyWithSoftMax )

layer 1

W1=Parameter(HDim, SDim)
B1=Parameter(HDim, 1)
T1=Times(W1, featNorm)
P1=Plus(T1, B1)
R1=ReLU(P1)
D1=Dropout(R1)

layer 2

W2=Parameter(HDim, HDim)
B2=Parameter(HDim,1 )
T2=Times(W2,D1) ###### Is the input to the T2 node is D1 ?
P2=Plus(T2, B2)
R2=ReLU(P2)
D2=Dropout(R2)


........................................
.......................................

output layer

WO=Parameter(LDim, HDim)
BO=Parameter(LDim,1)
TO=Times(WO, D7)
PO=Plus(TO, BO)


For the first iteration I do not get any NaNs. However, from the second iteration I end up getting Nans:

Finished Epoch[1]: [Training Set] Train Loss Per Sample = 4.1712084 EvalErr Per Sample = 0.77799535 Ave Learn Rate Per Sample = 0.0003906250058 Epoch Time=1162.066
Finished Epoch[1]: [Validation Set] Train Loss Per Sample = 3.3579302 EvalErr Per Sample = 0.70523864
Finished Epoch[2]: [Training Set] Train Loss Per Sample = nan EvalErr Per Sample = 0.92654443 Ave Learn Rate Per Sample = 0.00390625 Epoch Time=1179.09
Finished Epoch[2]: [Validation Set] Train Loss Per Sample = nan EvalErr Per Sample = 0.92926109
Finished Epoch[3]: [Training Set] Train Loss Per Sample = nan EvalErr Per Sample = 0.93325633 Ave Learn Rate Per Sample = 0.00390625 Epoch Time=1180.084


I am sure that my config file is not accurate because for DNNs without Dropout I get decent performance. Any suggestions on where the error might be ?

Thanks.
May 18, 2015 at 4:15 AM
The config file seems to be correct although you can significantly simplify it by using macros (except it's unclear how the criterion node is used). what dropout rate did you use in the SGD block (often 0.1-0.2 is fine and larger number will decrease the performance). Also for ReLu you might want to use slightly smaller learning rate for convergence. you may also debug the problem by setting traceLevel to 1 and see which minibatch is causing the problem.
May 18, 2015 at 4:17 AM
oh, your second epoch's learning rate is 10 times larger than the first epoch?
May 18, 2015 at 5:35 AM
Dear Dongyu,

I am currently using a dropoutrate of 0.5 for 10 iterations and then 0.2 for the rest. Yes the learning rate has been increased 10 times for the second epoch. I was using AutoAdjust setup for increasing and decreasing the learning rate. I do agree I might not get the best results with the parameters used. My concern is that I have run the same experiment without Dropouts and I did not come across any issues.


Thank you for input. I will check again the log file with your suggestions and I will try to identify any issues with the training of the minibatches.