4 hashes parallel on SSE2 CPUs for 0.3.6

Participants: tcatm Satoshi Nakamoto

This patch will calculate four hashes on one core using vector instructions. There’s a test programm included that validates the new hash function against the old one so it should be correct.

The patch is against 0.3.6. Improves khash/s by roughly 115%.

http://pastebin.com/XN1JDb53