On 04/28/2011 12:30 PM, Pekka Enberg wrote: > Hi, > > On 4/28/11 1:19 PM, Erik Faye-Lund wrote: >> On Thu, Apr 28, 2011 at 12:10 PM, Pekka Enberg<penberg@xxxxxxxxxxxxxx> wrote: >>> On 4/28/11 12:50 PM, Erik Faye-Lund wrote: >>>>> >>>>> Alas, i have not seen these sha1 hash buffers being allocated unaligned >>>>> (in my >>>>> very limited testing). In which spots are they allocated unaligned? >>>> >>>> Like I said above, it can happen when allocated on the stack. But it >>>> can also happen in malloc'ed structs, or in global variables. An array >>>> is aligned to the size of it's base member type. But malloc does >>>> worst-case-allignment, because it happens at run-time without >>>> type-information. >>> >>> I'd be very surprised if malloc() did "worst case alignment" - that'd suck >>> pretty badly from performance point of view. >> >> From POSIX (I don't have K&R at hand, but it's also specified there): >> "The pointer returned if the allocation succeeds shall be suitably >> aligned so that it may be assigned to a pointer to any type of object >> and then used to access such an object in the space allocated (until >> the space is explicitly freed or reallocated)." >> >> I put it in quotes because it's not the worst-case alignment you can >> ever think of, but rather the worst case alignment of your CPUs >> alignment requirements. This is 4 bytes for most CPUs. > > That's just the minimum guarantee! Why do you think modern malloc() implementations don't try *very* hard to provide best possible alignment? > >>> Stack allocation alignment is a harder issue but I doubt it's as bad as you >>> make it out to be. On x86, for example, stack pointer is almost always 8 or >>> 16 byte aligned with compilers whose writers have spent any time reading the >>> Intel optimization manuals. >>> >>> So yes, your statements are absolutely correct but I strongly doubt it >>> matters that much in practice unless you're using a really crappy >>> compiler... >> >> I'm sorry, but the the fact of the matter is that we don't write code >> for one compiler, we try to please many. Crappy compilers are very >> much out there in the wild, and we have to deal with it. So, we can't >> depend on char-arrays being aligned to 32-bytes. This code WILL break >> on GCC for ARM, so it's not a theoretical issue at all. It will also >> most likely break on GCC for x86 when optimizations are disabled. > > Yes, ARM is a problem and I didn't try to claim otherwise. However, it's not "impossible to fix" as you say with memalign(). > #define is_aligned(ptr) (ptr & (sizeof(void *) - 1)) if (is_aligned(sha1) && is_aligned(sha2)) return aligned_and_fast_hashcmp(sha1, sha2); return memcmp(sha1, sha2, 20); Problem solved for all architectures. Not as fast as the original patch when we're lucky with alignment, but we cater to sucky compilers and make the good ones go a lot faster. The really good compilers that recognizes "is it aligned?" checks will optimize the is_aligned() checks away or at least hint at the branch prediction which path it should prefer. Once again; Bear in mind that x86 style architectures with gcc is almost certainly the most common combo for git users by a very wide margin, so a 25-30% speedup for those users is pretty worthwhile. -- Andreas Ericsson andreas.ericsson@xxxxxx OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html