Hello, On Mon, 5 Aug 2013, Simon Kirby wrote: > > 1. same as my first email: > > > > int loh, doh; > > > > (__u64/__s64) loh * atomic_read(&dest->weight) > > > > In this case I see only one difference between > > __u64 and __s64: > > > > - jb .L41 #, > > - ja .L79 #, > > + jl .L41 #, > > + jg .L79 #, > > > > 2. Your patch: > > > > unsigned int loh, doh; > > > > (__u64) loh * (__u32) atomic_read(&dest->weight) > > or > > (__s64) loh * (__u32) atomic_read(&dest->weight) > > Did you mean here "(__s64) loh * (__s32) atomic_read(&dest->sweight)"? > > If not, the results for me on GCC 4.7.2 were what I posted at > http://0x.ca/sim/ref/3.9-ipvs/. Reading your reply I see that the key issue you are missing here is that the type of loh/doh matters, it should match the type of atomic_read (and its signedness). As atomic_t is int we prefer loh and doh to be int. The __u64/__s64 cast before loh/doh does not matter at all. If both operands are not from same signedness the 32*32 optimization is not applied. > > Both solutions generate code that differs only > > in imul vs. mul. In internet I see that imul is > > preferred/faster than mul. That is why I prefer solution 1, > > it has less casts. > > > > So, I think you can change your patch as follows: > > > > 1. Use int for loh, doh. Note that some schedulers > > use 'unsigned int' and should be patched for this > > definition: NQ, SED, WLC > > > > 2. Use (__u64) prefix only, no (__u32) before atomic_read: > > LBLC, LBLCR, NQ, SED, WLC > > > > (__u64) loh * atomic_read(&dest->weight) ... > > (__u64) doh * ... > > > > 3. Explain in commit message that we find the > > result64=int32*int32 faster than result64=uint32*uint32 > > and far better than using 64*64 multiply which is > > a bit slower on older CPUs. > > I found that u64*u32 was faster than u64*s32, but I didn't check s64*s32. It is faster because your loh/doh are u32 and single mul is generated while for u64*s32 we have u64=u32*s32 which obviously is not implemented with single imul/mul. Our goal is single imul. > I checked now and found that u64*u32 and s64*s32 have the same number of > output instructions on i386, with just the different signedness tests. > Both actually use imul since it's the comparison after which it cares > about for signedness. IIRC, (u64) u32 * u32 uses mul if you compile with optimizations (-O). > But what actually makes sense? Do negative weights ever make sense? __u64/__s64 cast does not matter because both operands are positive values. As result, this solution looks better: int loh, doh; (__s64) loh * atomic_read(&dest->weight) because: - both operands are 'int' => no extra casts before atomic_read - 'int', not 'unsigned int' because imul is better than mul Regards -- Julian Anastasov <ja@xxxxxx> -- To unsubscribe from this list: send the line "unsubscribe lvs-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html