D_float?!? Ugh. The DtoT sequence is the longest. Depending on your floating operations, it could be 10x slower than just using native T_float. Why use D? Do you have binary data you are using?