This patch will calculate four hashes on one core using vector instructions. There’s a test programm included that validates the new hash function against the old one so it should be correct.
The patch is against 0.3.6. Improves khash/s by roughly 115%.