On 2021-10-04 at 08:20:44, Jeff King wrote: > Oh wait, I'm reading it totally wrong. Adding in the extra 4 bytes > actually made it _faster_ than not having an algo field. Now I'm > super-confused. I could believe that it gave us some better alignment, > but the original struct was 32 bytes. 36 seems like a strictly worse > number. My guess is that the increased alignment means that memcpy can perform much better. Just because x86 has "fast" unaligned access doesn't mean it's free; there remains a penalty for that, although other architectures which support unaligned access have much worse ones. memcpy and memcmp will perform better when they can use 32-bit chunks to read without having to process the potentially unaligned pieces at the beginning and end. For the record, I have no particular stylistic opinion about whether we should adopt the proposed patch, but of course if it's faster as it is, we should probably leave it. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature