You should try it with tcatm’s 4-way SSE2 SHA in sha256.cpp. It compiles fine as a C file, just rename sha256.cpp to sha256.c. I was able to get it to work in simple tests on Windows, but not when linked in with Bitcoin. It may have a better chance of working as part of a C program instead of C++.
I’ll take a look. VIA Padlock support may also be similarly easy to integrate.