Hello, On Fri, 24 May 2013, Julian Anastasov wrote: > On Thu, 23 May 2013, Simon Kirby wrote: > > Hmm, I was comparing atomic_t being s32 versus u32, not u64 being u64. :) > > Anyway, the .s results are much easier to read, and (closer to) reality! > > I did a comparison with (__u64)loh * atomic_read(dest->weight) versus > > (__u64)loh * (__u32)atomic_read(dest->weight) on both arches and uploaded > > them to http://0x.ca/sim/ref/3.9-ipvs/. It's not a huge difference, but I > > prefer the shorter/faster version. ;) > > I now see why your patch shows difference compared > to my tests month ago. This change is the culprit: > > - int loh, doh; > + unsigned int loh, doh; > > It effectively changes the operation from: > > (__u64/__s64) int * int > > into > > (__u64) unsigned int * int > > that is why you fix it by using __u32: > > (__u64) unsigned int * unsigned int > > so that both operands are from same 4-byte signedness. > > I think, we should keep loh and doh to be int, may be > the following both solutions should generate 32x32 multiply: > > 1. same as my first email: > > int loh, doh; > > (__u64/__s64) loh * atomic_read(&dest->weight) > > In this case I see only one difference between > __u64 and __s64: > > - jb .L41 #, > - ja .L79 #, > + jl .L41 #, > + jg .L79 #, > > 2. Your patch: > > unsigned int loh, doh; > > (__u64) loh * (__u32) atomic_read(&dest->weight) > or > (__s64) loh * (__u32) atomic_read(&dest->weight) > > Both solutions generate code that differs only > in imul vs. mul. In internet I see that imul is > preferred/faster than mul. That is why I prefer solution 1, > it has less casts. > > So, I think you can change your patch as follows: > > 1. Use int for loh, doh. Note that some schedulers > use 'unsigned int' and should be patched for this > definition: NQ, SED, WLC > > 2. Use (__u64) prefix only, no (__u32) before atomic_read: > LBLC, LBLCR, NQ, SED, WLC > > (__u64) loh * atomic_read(&dest->weight) ... > (__u64) doh * ... > > 3. Explain in commit message that we find the > result64=int32*int32 faster than result64=uint32*uint32 > and far better than using 64*64 multiply which is > a bit slower on older CPUs. Simon, any progress on this change? I can continue and finish it if you prefer so? Regards -- Julian Anastasov <ja@xxxxxx> -- To unsubscribe from this list: send the line "unsubscribe lvs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html