TrainLossPerSample = nan

Nov 26, 2015 at 1:56 PM
Edited Nov 26, 2015 at 1:57 PM
I use ndl to create a network, in which I call convolution function. When I use CrossEntropyWithSoftmax as train criterion the network works well, but when i change to SE, I get:

Epoch[1 of 140]-Minibatch[1-10 of 515396075]: SamplesSeen = 2000; TrainLossPerSample = 21.692938; EvalErr[0]PerSample = 0.49849999; TotalTime=0.442162; TotalTi
345 Epoch[1 of 140]-Minibatch[11-20 of 515396075]: SamplesSeen = 2000; TrainLossPerSample = 1258057.1; EvalErr[0]PerSample = 0.4885; TotalTime=0.250898; TotalTimeP
346 Epoch[1 of 140]-Minibatch[21-30 of 515396075]: SamplesSeen = 2000; TrainLossPerSample = 97133224; EvalErr[0]PerSample = 0.49950001; TotalTime=0.254124; TotalTi
347 Epoch[1 of 140]-Minibatch[31-40 of 515396075]: SamplesSeen = 2000; TrainLossPerSample = 3.7230273e+19; EvalErr[0]PerSample = 0.5; TotalTime=0.244796; TotalTime
348 Epoch[1 of 140]-Minibatch[41-50 of 515396075]: SamplesSeen = 2000; TrainLossPerSample = 9.6748813e+22; EvalErr[0]PerSample = 0.5115; TotalTime=0.332546; TotalT
349 Epoch[1 of 140]-Minibatch[51-60 of 515396075]: SamplesSeen = 2000; TrainLossPerSample = 5.0003218e+23; EvalErr[0]PerSample = 0.47999999; TotalTime=0.337931; To
350 Epoch[1 of 140]-Minibatch[61-70 of 515396075]: SamplesSeen = 2000; TrainLossPerSample = 2.7006239e+28; EvalErr[0]PerSample = 0.505; TotalTime=0.235128; TotalTi
351 Epoch[1 of 140]-Minibatch[71-80 of 515396075]: SamplesSeen = 2000; TrainLossPerSample = 1.8081913e+30; EvalErr[0]PerSample = 0.4905; TotalTime=0.234829; TotalT
352 Epoch[1 of 140]-Minibatch[81-90 of 515396075]: SamplesSeen = 2000; TrainLossPerSample = nan; EvalErr[0]PerSample = 0.5; TotalTime=0.27526; TotalTimePerSample=0
353 Epoch[1 of 140]-Minibatch[91-100 of 515396075]: SamplesSeen = 2000; TrainLossPerSample = nan; EvalErr[0]PerSample = 0.5; TotalTime=0.238847; TotalTimePerSample
354 WARNING: The same matrix with dim [1, 1] has been transferred between different devices for 20 times.

Computation Node Order:

SEcri[0, 0] = SquareError(labels[2, 200], OutputNodes.z[0, 0])
216 OutputNodes.z[0, 0] = Plus(OutputNodes.t[0, 0], OutputNodes.b[2, 1])
217 OutputNodes.b[2, 1] = LearnableParameter
218 OutputNodes.t[0, 0] = Times(OutputNodes.W[2, 128], h1_d[0, 0])
219 h1_d[0, 0] = Dropout(conv1_act.act[0, 0])
220 conv1_act.act[0, 0] = Sigmoid(conv1_act.convPlusB[0, 0])
221 conv1_act.convPlusB[0, 0] = Plus(conv1_act.conv[0, 0], conv1_act.convB[128, 1])
222 conv1_act.convB[128, 1] = LearnableParameter
223 conv1_act.conv[0, 0] = Convolution(conv1_act.convW[128, 1212], f.xNorm[0, 0])
224 f.xNorm[0, 0] = PerDimMeanVarNormalization(features[1212, 200], f.xMean[1212, 1], f.xStdDev[1212, 1])
225 f.xStdDev[1212, 1] = InvStdDev(features[1212, 200])
226 f.xMean[1212, 1] = Mean(features[1212, 200])
227 features[1212, 200] = InputValue
228 conv1_act.convW[128, 1212] = LearnableParameter
229 OutputNodes.W[2, 128] = LearnableParameter
230 labels[2, 200] = InputValue

has anybody got this?