Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 29, 2021 at 04:34:07PM +0000, Michał Mirosław wrote:
> Dnia 29 listopada 2021 06:38:39 UTC, Yury Norov <yury.norov@xxxxxxxxx> napisał/a:
> >On Sun, Nov 28, 2021 at 07:03:41PM +0100, mirq-test@xxxxxxxxxxxx wrote:
> >> On Sat, Nov 27, 2021 at 07:56:55PM -0800, Yury Norov wrote:
> >> > In many cases people use bitmap_weight()-based functions like this:
> >> > 
> >> > 	if (num_present_cpus() > 1)
> >> > 		do_something();
> >> > 
> >> > This may take considerable amount of time on many-cpus machines because
> >> > num_present_cpus() will traverse every word of underlying cpumask
> >> > unconditionally.
> >> > 
> >> > We can significantly improve on it for many real cases if stop traversing
> >> > the mask as soon as we count present cpus to any number greater than 1:
> >> > 
> >> > 	if (num_present_cpus_gt(1))
> >> > 		do_something();
> >> > 
> >> > To implement this idea, the series adds bitmap_weight_{eq,gt,le}
> >> > functions together with corresponding wrappers in cpumask and nodemask.
> >> 
> >> Having slept on it I have more structured thoughts:
> >> 
> >> First, I like substituting bitmap_empty/full where possible - I think
> >> the change stands on its own, so could be split and sent as is.
> >
> >Ok, I can do it.
> >
> >> I don't like the proposed API very much. One problem is that it hides
> >> the comparison operator and makes call sites less readable:
> >> 
> >> 	bitmap_weight(...) > N
> >> 
> >> becomes:
> >> 
> >> 	bitmap_weight_gt(..., N)
> >> 
> >> and:
> >> 	bitmap_weight(...) <= N
> >> 
> >> becomes:
> >> 
> >> 	bitmap_weight_lt(..., N+1)
> >> or:
> >> 	!bitmap_weight_gt(..., N)
> >> 
> >> I'd rather see something resembling memcmp() API that's known enough
> >> to be easier to grasp. For above examples:
> >> 
> >> 	bitmap_weight_cmp(..., N) > 0
> >> 	bitmap_weight_cmp(..., N) <= 0
> >> 	...
> >
> >bitmap_weight_cmp() cannot be efficient. Consider this example:
> >
> >bitmap_weight_lt(1000 0000 0000 0000, 1) == false
> >                 ^
> >                 stop here
> >
> >bitmap_weight_cmp(1000 0000 0000 0000, 1) == 0
> >                                 ^
> >                                 stop here
> >
> >I agree that '_gt' is less verbose than '>', but the advantage of 
> >'_gt' over '>' is proportional to length of bitmap, and it means
> >that this API should exist.
> 
> Thank you for the example. Indeed, for less-than to be efficient here you would need to replace
>  bitmap_weight_cmp(..., N) < 0
> with
>  bitmap_weight_cmp(..., N-1) <= 0

Indeed, thanks for pointing to it.
 
> It would still be more readable, I think.

To be honest, I'm not sure that
        bitmap_weight_cmp(..., N-1) <= 0
would be an obvious replacement for the original
        bitmap_weight(...) < N
comparing to 
        bitmap_weight_lt(..., N)

I think the best thing I can do is to add bitmap_weight_cmp() as
you suggested, and turn lt and others to be wrappers on it. This
will let people choose a better function in each case.

I also think that for v2 it would be better to drop the conversion
for short bitmaps, except for switching to bitmap_empty(), because
in that case readability wins over performance; if no objections. 

Thanks,
Yury



[Index of Archives]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux