Re: activeconns * weight overflowing 32-bit int

Julian Anastasov <ja@xxxxxx> · Mon, 5 Aug 2013 09:10:24 +0300 (EEST)

	Hello,

On Fri, 24 May 2013, Julian Anastasov wrote:

> On Thu, 23 May 2013, Simon Kirby wrote:

> > Hmm, I was comparing atomic_t being s32 versus u32, not u64 being u64. :)
> > Anyway, the .s results are much easier to read, and (closer to) reality!
> > I did a comparison with (__u64)loh * atomic_read(dest->weight) versus
> > (__u64)loh * (__u32)atomic_read(dest->weight) on both arches and uploaded
> > them to http://0x.ca/sim/ref/3.9-ipvs/. It's not a huge difference, but I
> > prefer the shorter/faster version. ;)
> 
> 	I now see why your patch shows difference compared
> to my tests month ago. This change is the culprit:
> 
> -       int loh, doh;
> +       unsigned int loh, doh;
> 
> 	It effectively changes the operation from:
> 
> (__u64/__s64) int * int
> 
> 	into
> 
> (__u64) unsigned int * int
> 
> 	that is why you fix it by using __u32:
> 
> (__u64) unsigned int * unsigned int
> 
> 	so that both operands are from same 4-byte signedness.
> 
> 	I think, we should keep loh and doh to be int, may be
> the following both solutions should generate 32x32 multiply:
> 
> 1. same as my first email:
> 
> int loh, doh;
> 
> (__u64/__s64) loh * atomic_read(&dest->weight)
> 
> 	In this case I see only one difference between
> __u64 and __s64:
> 
> -       jb      .L41    #,
> -       ja      .L79    #,
> +       jl      .L41    #,
> +       jg      .L79    #,
> 
> 2. Your patch:
> 
> unsigned int loh, doh;
> 
> (__u64) loh * (__u32) atomic_read(&dest->weight)
> or
> (__s64) loh * (__u32) atomic_read(&dest->weight)
> 
> 	Both solutions generate code that differs only
> in imul vs. mul. In internet I see that imul is
> preferred/faster than mul. That is why I prefer solution 1,
> it has less casts.
> 
> 	So, I think you can change your patch as follows:
> 
> 1. Use int for loh, doh. Note that some schedulers
> use 'unsigned int' and should be patched for this
> definition: NQ, SED, WLC
> 
> 2. Use (__u64) prefix only, no (__u32) before atomic_read:
> LBLC, LBLCR, NQ, SED, WLC
> 
> 	(__u64) loh * atomic_read(&dest->weight) ...
> 	(__u64) doh * ...
> 
> 3. Explain in commit message that we find the
> result64=int32*int32 faster than result64=uint32*uint32
> and far better than using 64*64 multiply which is
> a bit slower on older CPUs.

	Simon, any progress on this change? I can
continue and finish it if you prefer so?

Regards

--
Julian Anastasov <ja@xxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html