I started using CNTK for some of my tasks. It is quite interesting and very simple to train NNs.
One of the things that I have been trying is to build
word semantic embeddings using DSSM
You can find my (non working, experimental) configuration and NDL files
Compared to the original work I am using a small dataset to begin with. My dataset has about 10K documents and vocabulary of 17k words. Word hashing reduces the one hot input dimensionality to about 8K with the tri-gram character representation. This is the
input to the model described in the configuration and NDL files.
For the moment I am facing the following problems to train the model:
 The DSSMReader available in the CNTK source has some Windows specific api's. While I am trying to port it to work on my linux (by replacing Windows file mapping api's with mmap), I would like to know if it has been already tried by someone? Also what is
the format of data expected by DSSMReader?
 As an alternative, I am trying to use the UCIFastReader in my configuration. The uci file stores concatenation of the query and document input vectors and are sliced in the NDL file. The labels in the file are dummy and not used in the model. However with
this approach I found that the UCIFastReader does not read all the lines/records in the uci input file. A little debugging of the source code showed that UCIParser uses _ftelli64() in the GetFilePosition() function to calculate the size of the uci file and
this fails to give the true file size on my machine (Linux 3.13.0-52-generic #86-Ubuntu 64bit, gcc version 4.8.2). I replaced _ftelli64() with ftell() and now it can read the entire file. (Yes this is not the best/portable solution and may be something like
stat will be a better one). As i debug further, it seems some variables (like m_totalNumbersConverted) in UCIParser should also be changed to int64_t. Before making these changes for myself I would like to confirm if this is really an issue or something wrong
with my setup.
Further I would be glad to learn about my mistakes and the inefficient steps in my network configuration. (Specially in calculation of the loss function.)