About symmetric context windows

Oct 19, 2015 at 1:49 PM
Edited Oct 19, 2015 at 1:50 PM
In case of using spectral power (257 dim.) as input feature vector,
it is exhausting to use 11 frames of context window (2827 dim.).
In CNTK book, it says that symmetric context windows are supported,
but are they only supported on the Simple Network Builder using lookupTableOrder?
If then, other choice is using 'blockRandomize'.
However, when running the cntk using below configure,
the program is dead due to memory lack. (my RAM is 32GB)
CNTK says that 'blockRandomize' is appropriate for large corpora
because the data in each block is read directly from the feature files.
But in my case, it seems that the data is loaded on RAM.
reader=[
   readerType=HTKMLFReader
   readMethod=blockRandomize
   miniBatchMode=Partial
   verbosity=1

   featIn=[
   dim=2827
   scpFile=ach/mfc.multi.ach
   ]

   featOut=[
     scpFile=ach/mfc.clean.ach
     dim=40
   ]
 ]
Nov 2, 2015 at 9:04 PM
Symmetric context windows are supported simply by specifying the larger context window dimension in the feature block, as you have done. You are running out of memory for a different reason. In the blockRandomize reader, the "randomize" parameter decides how much data to load into memory at once. The default behavior is "randomize=auto" which will try to load the entire data set into memory. If you have a large data set and a large feature vector size (257) this may cause you to run out of memory. You can set the "randomize" parameter to control how much data to read into memory at once. For example, we will typically load 48 hours of data into memory at once (and blocks of data are paged into and out of memory appropriately). This corresponds to about 17M frames (randomize=17280000). For 257-dim features, this is about 17GB so you can choose something smaller if necessary for your setup.
Nov 4, 2015 at 2:25 AM
Thank you very much. Your comments are really helpful to me.
You mean context window can be applied by only specifying the whole dimension of concatenated feature
because the dimension of each frame can be calculated using file size and the frame length provieded by ach file?
With respect to memory problem, I got it.^^

Best regard