Re: [PATCH] lib/sbitmap: kill 'depth' from sbitmap_word

Martin Wilck <martin.wilck@xxxxxxxx> · Mon, 10 Jan 2022 16:22:56 +0000

Hello Jens,

On Sun, 2022-01-09 at 19:43 -0700, Jens Axboe wrote:
> On 1/9/22 7:38 PM, Ming Lei wrote:
> > On Sun, Jan 09, 2022 at 06:54:21PM -0700, Jens Axboe wrote:
> > > On 1/9/22 6:50 PM, Ming Lei wrote:
> > > > Only the last sbitmap_word can have different depth, and all
> > > > the others
> > > > must have same depth of 1U << sb->shift, so not necessary to
> > > > store it in
> > > > sbitmap_word, and it can be retrieved easily and efficiently by
> > > > adding
> > > > one internal helper of __map_depth(sb, index).
> > > > 
> > > > Not see performance effect when running iops test on null_blk.
> > > > 
> > > > This way saves us one cacheline(usually 64 words) per each
> > > > sbitmap_word.
> > > 
> > > We probably want to kill the ____cacheline_aligned_in_smp from
> > > 'word' as
> > > well.
> > 
> > But sbitmap_deferred_clear_bit() is called in put fast path, then
> > the
> > cacheline becomes shared with get path, and I guess this way isn't
> > expected.
> 
> Just from 'word', not from 'cleared'. They will still be in separate
> cache lines, but usually doesn't make sense to have the leading
> member
> marked as cacheline aligned, that's a whole struct property at that
> point.
> 

while discussing this - is there any data about how many separate cache
lines (for either "word" or "cleared") are beneficial for performance?

For bitmap sizes between 4 and 512 bit (on x86_64), the code generates
layouts with 4-8 cache lines, but above that, the number of cache lines
grows linearly with bitmap size. I am wondering whether we should
consider utilizing more of the allocated memory once a certain number
of separate cache lines is exceeded, by accessing additional words in
the existing cache lines.

Could you comment on that?

Thanks,
Martin