So, as mentioned in this comment, to achieve fast DTW calculations, one way is to use "lower bounds" in the DTW process, the UCR Suite is a great example of how this can be acheived.
I wanted to post this in my blog a long time ago, just didn't have the time to do everything for this.
the UCR Suite is written in C++ and since it's really easy moving around pointers than it is in C# safe code, I understand the choice of language.
However, with some thought the same code could be written in C# and be even faster and more efficient.
So, since i am primarily a .net coder, and wanted to only use managed code and no unsafe exceptions, I had to take the C++ code and port it to C# code.
When built in "Release" configurations it is quite faster than the C++ Suite.
Prof. Eamonn Keogh and company did a great job optimizing DTW and the applications are endless.
I've placed the C# Code as a Visual Studio 2010 Solution on github.
Get the code here!
This code is written without unsafe code.
I might publish another unsafe version for even better performance.