Re: 4 hashes parallel on SSE2 CPUs for 0.3.6

Figures: tcatm
Quote from: satoshi on July 30, 2010, 3:29:20 PM UTC

That’s amazing…

So are you saying you use 128-bit registers to SIMD four 32-bit data at once? I’ve wondered about that for a long time, but I didn’t think it would be possible due to addition carrying into the neighbour’s value.

That’s how it works. Four 32 bit values in a 128 bit vector. They’re calculated independently, but at the same time.

Btw. Why are you using this alignup<16> function when attribute ((aligned (16))) will tell the compiler to align at compiletime?