Follow-up: Compiling linux-gcc fork

Jan 27, 2015 at 12:31 AM
As reported previously, I was able to build cn.exe with ACML 5.3.1 under gcc 4.8.2 (Ubuntu 14.04). I edited Demos/Simple/Simple.config to use the cpu only and and added ~/CNTK/bin to LD_LIBRARY_PATH and ran ~/CNTK/Demos/Simple$ ../../bin/cn.exe configFile=Simple.config
It ran and produced a log file. Training convergence was similar but not identical to the results running in Windows. The speed was MUCH slower than the Windows version, by a factor of over 50. Both were running CPU-only. Why the large speed difference? Richard
Jan 27, 2015 at 1:00 AM
Are you compiling with the debug option. This may be the reason. Right now, we are not focusing much on computation time yet.
Jan 27, 2015 at 1:41 AM
I just tried building with

The compilation of the .o files goes OK but in building cn.exe itself I get many errors like this:

building output for x86_64 with build type release ...
g++ -O4 -o bin/x86_64.cpu.release.acml/cn.exe .build/x86_64.cpu.release.acml/MachineLearning/cn/NetworkDescriptionLanguage.o .build/x86_64.cpu.release.acml/MachineLearning/cn/cn.o .build/x86_64.cpu.release.acml/MachineLearning/cn/ComputationNode.o .build/x86_64.cpu.release.acml/MachineLearning/cn/ModelEditLanguage.o .build/x86_64.cpu.release.acml/MachineLearning/cn/PTaskGraphBuilder.o .build/x86_64.cpu.release.acml/MachineLearning/cn/SimpleNetworkBuilder.o .build/x86_64.cpu.release.acml/MachineLearning/cn/tests.o .build/x86_64.cpu.release.acml/MachineLearning/CNTKEval/CNTKEval.o .build/x86_64.cpu.release.acml/Math/Math/Matrix.o .build/x86_64.cpu.release.acml/Math/Math/CPUMatrix.o .build/x86_64.cpu.release.acml/Math/Math/CPUSparseMatrix.o .build/x86_64.cpu.release.acml/Math/Math/NoGPU.o .build/x86_64.cpu.release.acml/Common/fileutil.o .build/x86_64.cpu.release.acml/Common/DataWriter.o .build/x86_64.cpu.release.acml/Common/ConfigFile.o .build/x86_64.cpu.release.acml/Common/DataReader.o .build/x86_64.cpu.release.acml/Common/Eval.o .build/x86_64.cpu.release.acml/Common/File.o .build/x86_64.cpu.release.acml/Common/BestGpu.o -L/opt/acml5.3.1/ifort64_int64/lib -lacml -lm -lpthread -fopenmp -ldl
.build/x86_64.cpu.release.acml/MachineLearning/cn/NetworkDescriptionLanguage.o: In function `Microsoft::MSR::CNTK::NegateNode<float>::GetPTaskDescriptor(Microsoft::MSR::CNTK::TaskType, unsigned long) const':
NetworkDescriptionLanguage.cpp:(.text._ZNK9Microsoft3MSR4CNTK10NegateNodeIfE18GetPTaskDescriptorENS1_8TaskTypeEm[_ZNK9Microsoft3MSR4CNTK10NegateNodeIfE18GetPTaskDescriptorENS1_8TaskTypeEm]+0x47): undefined reference to `Microsoft::MSR::CNTK::TaskDescriptor<float>::GradientParam(int, unsigned int, float)'
NetworkDescriptionLanguage.cpp:(.text._ZNK9Microsoft3MSR4CNTK10NegateNodeIfE18GetPTaskDescriptorENS1_8TaskTypeEm[_ZNK9Microsoft3MSR4CNTK10NegateNodeIfE18GetPTaskDescriptorENS1_8TaskTypeEm]+0x5c): undefined reference to `Microsoft::MSR::CNTK::TaskDescriptor<float>::GradientParam(int, unsigned int, float)'
.build/x86_64.cpu.release.acml/MachineLearning/cn/NetworkDescriptionLanguage.o: In function `Microsoft::MSR::CNTK::SumElementsNode<float>::GetPTaskDescriptor(Microsoft::MSR::CNTK::TaskType, unsigned long) const':
...
There are some undefined references in NetworkDescriptionLanguage.o. Can you fix the problem from the errors above?

Thanks,
Richard


At 06:00 PM 1/26/2015, you wrote:

From: erdogan
Are you compiling with the debug option. This may be the reason. Right now, we are not focusing much on computation time yet.

Read the full discussion online.

To add a post to this discussion, reply to this email ([email removed])

To start a new discussion for this project, email [email removed]

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com
Jan 27, 2015 at 2:32 AM
I guess optimizations do not work right now. Just remove -O4 from the Makefile.cpu and re-compile. Still it will probably be slower in linux.
Linux-gcc port is still experimental and there is a lot of work needed to make it work close to the windows version.

Hakan
Jan 27, 2015 at 3:37 AM
Thanks. removing -O4 makes the compile work. Why was it causing a problem?

The speed on Demos/Simple is a little faster than Debug but still much slower than Windows.

What has to be done to make it run faster?

Richard



At 07:33 PM 1/26/2015, you wrote:

From: erdogan
I guess optimizations do not work right now. Just remove -O4 from the Makefile.cpu and re-compile. Still it will probably be slower in linux.
Linux-gcc port is still experimental and there is a lot of work needed to make it work close to the windows version.

Hakan

Read the full discussion online.

To add a post to this discussion, reply to this email ([email removed])

To start a new discussion for this project, email [email removed]

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com
Jan 27, 2015 at 4:12 AM
It seems to be related to template functions. It looks like we may need to put both the declaration and the definition inside header files. Or, there may be other solutions.


http://www.parashift.com/c++-faq/templates-defn-vs-decl.html
Jan 27, 2015 at 4:15 AM
Jan 27, 2015 at 4:06 PM
It appears I was misled about the speed of the Linux release CNTK vs. Windows. The epoch times printed out in the log file Demo_Simple_Demo_Simple_Demo_Output.log are incorrect and much too large in linux. The actual total elapsed time to run 25 epochs
of the Simple demo is 7.9 sec. on my Windows machine (CPU is i7 3770K running turbo mode at 4.1 GHz) and about 17 sec on linux (CPU is i7 920 running at 2.81 GHz)--most of the difference in speed may be accounted for by the different CPU's. But there does
seem to be a bug in the way cn.exe calculated the epoch time. Richard
Jan 27, 2015 at 5:07 PM
At 09:15 PM 1/26/2015, erdogan wrote:
This solution is better http://www.parashift.com/c++-faq/separate-template-class-defn-from-decl.html
I will try to do it.

Interesting. I wonder why it worked at all.

Please let me know your results when you implement this.

Richard




Read the full discussion online.

To add a post to this discussion, reply to this email ([email removed])

To start a new discussion for this project, email [email removed]

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com
Jan 28, 2015 at 11:10 PM
Erdogan,

Thanks, I see that you fixed the template instantiation problem that caused an error in Release compilation. CNTK now compiles in release mode with optimization flag -O4. It also runs much faster now, about twice as fast as debug mode: .23 sec per epoch on the Simple demo in cpu mode on my Intel i7 920 CPU with MKL.

===

Epoch Time is still reported incorrectly in the log file (stderr). This is probably because the clock() function in linux includes CPU time in child processes whereas in Windows it returns real time.

===

Is GPU support available in the linux build? I encountered this error trying to build debug mode with Makefile.gpu:
Math/Math/GPUSparseMatrix.cu(2296): error: expected a type specifier
1 error detected in the compilation of "/tmp/tmpxft_00003cd8_00000000-6_GPUSparseMatrix.cpp1.ii".
===

I hope HTKMLReader will soon be available so I can run some speech training with HTK data files. Do you know how close this might be?

Richard



At 09:12 PM 1/26/2015, you wrote:

From: erdogan
It seems to be related to template functions. It looks like we may need to put both the declaration and the definition inside header files. Or, there may be other solutions.


http://www.parashift.com/c++-faq/templates-defn-vs-decl.html

Read the full discussion online.

To add a post to this discussion, reply to this email ([email removed])

To start a new discussion for this project, email [email removed]

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com
Jan 29, 2015 at 4:21 PM
I am working on the HTKMLFReader but it turns out to be hard to port to linux. It will take time. I am not sure how much time, since I am busy with other things as well.

I think Yu is working on the GPUMatrix code. I am not sure whether it is easy or difficult to correct the error you are seeing.

Hakan
Jan 29, 2015 at 5:45 PM
I think I know this issue, I'll fix it soon.
Jan 29, 2015 at 5:46 PM
I think I know this issue, I'll fix it soon.
Jan 29, 2015 at 7:39 PM
The error is caused by a recent update use some visual-studio/windows build-in type. So Linux cannot recognize.