If I didn’t know better, I would say the key is the CPU cache size. Seems all the CPU that run slower have 2 MB or less onboard cache, where as the Core i5 starts with at least 3MB of onboard CPU cache.
That’s unlikely. The loop accesses 432 bytes of data. That should fit in most caches.