Quote from: satoshi on July 31, 2010, 12:29:20 AM
That’s amazing…
So are you saying you use 128-bit registers to SIMD four 32-bit data at once? I’ve wondered about that for a long time, but I didn’t think it would be possible due to addition carrying into the neighbour’s value.
That’s how it works. Four 32 bit values in a 128 bit vector. They’re calculated independently, but at the same time.
Btw. Why are you using this alignup<16> function when attribute ((aligned (16))) will tell the compiler to align at compiletime?