Regularization

Oct 28, 2014 at 4:16 PM
Is it possible to add L1 Regularization into the optimization criteria?
I am trying to do this:
MeanSqrErr = SquareError(featOut, A2)
Criteria = Plus(MeanSqrErr, L1Reg(A1.Z.W));

CriteriaNodes = (Criteria)
EvalNodes = (MeanSqrErr)
But CNTK keeps throwing out this error:
The input matrix dimensions do not match.
Coordinator
Oct 29, 2014 at 5:36 AM
Typically you would scale the MeanSqrErr and L1Reg values and then add them up. However, the way you are using it should also work. You might want to check the log file to see whether this happens at the validation stage or the matrix computation stage. If it's in the validation stage it will tell you the dimensions of each input where the problem happens. It's possible that the problem occurs due to some configs in the other nodes. If it's indeed this line that is causing the problem we would like to see a repro so that we may fix it.

L1Reg is not efficient in CNTK though since we just use the subgradient instead of using dual proximal algorithms.
Oct 29, 2014 at 5:56 AM
Thanks a lot for the answer.

The problem occurs right in the gradient computation phrase of the first training epoch.
In the function ComputeInputPartialS, gradientValues has the dimension of (1,1) while gradientOfL1Norm has the dimension of (513,50), hence an exception is thrown in the later steps.
The setup works normally with L2Reg or without any regularization.

I have not read much about the gradient calculation of the normalization in CNTK yet. Is it also in the CNTKbook?

Here is the call stack
>   CNTKMath.dll!Microsoft::MSR::CNTK::Matrix<float>::AddElementProductOf Line 1681
    cn.exe!Microsoft::MSR::CNTK::MatrixL1RegNode<float>::ComputeInputPartialS Line 756
    cn.exe!Microsoft::MSR::CNTK::MatrixL1RegNode<float>::ComputeInputPartial Line 744
    cn.exe!Microsoft::MSR::CNTK::ComputationNode<float>::ComputeGradientForChildren Line 431
    cn.exe!Microsoft::MSR::CNTK::ComputationNetwork<float>::ComputeGradient Line 1530
    cn.exe!Microsoft::MSR::CNTK::SGD<float>::TrainOneEpoch Line 983
    cn.exe!Microsoft::MSR::CNTK::SGD<float>::TrainOrAdaptModel Line 558
    cn.exe!Microsoft::MSR::CNTK::SGD<float>::Train Line 364
    cn.exe!DoTrain<float> Line 436
    cn.exe!DoCommand<float> Line 514
And here is last part of the log file
No PreCompute nodes found, skipping PreCompute step
Set Max Temp Mem Size For Convolution Nodes to 0 samples.
WARNING: there is no convolution node.
minibatchiterator: epoch 0: frames [0..173612] (first utterance at frame 0) with 1 datapasses
randomordering: 0 retries for 173612 elements (0.0%) to ensure window condition
randomordering: recached sequence for seed 0: 62168, 28052, ...
recoverblock: recovering feature block 0 [0..65535)
recoverblock: recovering feature block 1 [65536..131071)
recoverblock: recovering feature block 2 [131072..196607)
recoverblock: recovering feature block 0 [0..65535)
recoverblock: recovering feature block 1 [65536..131071)
recoverblock: recovering feature block 2 [131072..196607)
EXCEPTION occurred: The input matrix dimensions do not match.
Oct 29, 2014 at 7:36 AM
Edited Oct 29, 2014 at 8:23 AM
Moreover, may I also know how to use the Scale node? I have tried 2 ways but cannot make it work.
Reg = Scale(0.0001, L2Reg(A1.Z.W))
or
lambda = 0.0001
Reg = Scale(lambda, L2Reg(A1.Z.W))
I tried to read the code and find out this:
case ndlTypeConstant:
case ndlTypeVariable:
    break;
where the function SetEvalValue of nodeParam is never called.
So later on the code break at this assertion because m_eval is not set (m_value is still recognized correctly as "0.0001" in both cases):
if (pass == ndlPassResolve)
{
    void* np = nodeResult->GetEvalValue();
    assert(np != nullptr);
    inputs.push_back((void*)np);
}
I am not sure if this is the correct way to pass scalar parameters into the Scale node.
Coordinator
Oct 29, 2014 at 10:22 AM
You can fix the bug in the L1Regnode by using the following function
    static void WINAPI ComputeInputPartialS(Matrix<ElemType>& gradientOfL1Norm, 
        Matrix<ElemType>& inputGradientValues, const Matrix<ElemType>& gradientValues, const Matrix<ElemType>& inputFunctionValues)  
    {
        gradientOfL1Norm.AssignSignOf(inputFunctionValues);
        inputGradientValues.AddWithScaleOf(gradientValues.Get00Element(), gradientOfL1Norm);
    }
you can find an example of using multiple objective function at ExampleSetups\ASR\TIMIT\config\mtl_senones_dr.ndl. This example does not use scale node though but use times node instead. you can use the scale node to do the same thing by defining a constant first. The scaleNode also has a bug. you should replace the corresponding function
    static void WINAPI EvaluateThisNodeS(Matrix<ElemType>& functionValues, const Matrix<ElemType>& input0, const Matrix<ElemType>& input1)  
    {
        functionValues.AssignProductOf(input0.Get00Element(), input1);

if NANCHECK

        functionValues.HasNan("Scale");

endif

    }
Oct 30, 2014 at 3:31 AM
I have updated the codes and followed the weighted function in the example in ExampleSetups\ASR\TIMIT\config\mtl_senones_dr.ndl, and it works very well.
Thank you very much for your help.