On Jun 26, 2014, at 12:38 PM, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > On June 26, 2014 11:41:48 AM EDT, "Atchley, Scott" <atchleyes@xxxxxxxx> wrote: >> On Jun 26, 2014, at 10:55 AM, James Bottomley >> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: >> >>> On Thu, 2014-06-26 at 16:53 +0200, Bart Van Assche wrote: >>>> On 06/11/14 11:09, Sagi Grimberg wrote: >>>>> + return xfer_len + (xfer_len >> ilog2(sector_size)) * 8; >>>> >>>> Sorry that I just noticed this now, but why is a shift-right and >> ilog2() >>>> used in the above expression instead of just dividing the transfer >>>> length by the sector size ? >>> >>> It's a performance thing. Division is really slow on most CPUs. >>> However, we know the divisor is a power of two so we re-express the >>> division as a shift, which the processor can do really fast. >>> >>> James >> >> I have done this in the past as well, but have you benchmarked it? >> Compilers typically do the right thing in this case (i.e replace >> division with shift). > > The compiler can only do that for values which are reducible to constants at compile time. This is a runtime value, the compiler has no way of deducing that it will be a power of 2 > > James You're right, I should have said runtime. However, gcc on Intel seems to choose the right algorithm at runtime. On a trivial app with -O0, I see the same performance for shift and division if the divisor is a power of two. Is see ~38% penalty if the divisor is not a power of 2. With -O3, shift is faster than division by about ~17% when the divisor is a power of two. Scott-- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html