Re: [PATCH v2 4/6] pack-bitmap: prepare to read lookup table

Taylor Blau <me@xxxxxxxxxxxx> · Wed, 29 Jun 2022 16:22:04 -0400

On Tue, Jun 28, 2022 at 02:29:50PM +0530, Abhradeep Chakraborty wrote:
> Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> > ...exactly my thoughts, too. It's possible that it would be faster to
> > key this search on the object_id "oid" above, and then convert each of
> > the entries in the lookup table from a uint32_t into an object_id by
> > calling nth_bitmap_object_oid() repeatedly.
> >
> > I *think* that what Abhradeep wrote here is going to be faster more
> > often than not since it makes more efficient use of the page cache
> > rather than switching between reads across different memory mapped
> > regions at each point in the binary search.
> >
> > But of course that depends on a number of factors. Abhradeep: if you're
> > up for it, I think it would be worth trying it both ways and seeing if
> > one produces a meaningful speed-up or slow-down over the other. Like I
> > said: my guess is that what you have now will be faster, but I don't
> > have a clear sense that that is true without trying it both ways ;-).
>
> Ok. Let me try both the ways. In my opinion, I think my version has
> less searching and less computation. So, I want to stick with this
> version. But I also like to try the other one once so that we can
> get the best out of these two.

Yeah, I agree with your general sense that the version as written is
going to be faster. We're comparing a smaller datatype (IOW, a 4-byte
integer that can be checked for equality in a single instruction,
instead of comparing two 20-byte OIDs), and likely flushing the cache
far less often.

But having two concrete implementations to compare will help us know for
a fact that our intuition is correct.

I'll be curious to see what you find here!

> > I think starting off with a small array and then letting it grow
> > according to alloc_nr() would be fine here, since it will grow more and
> > more each time, so the amount of times we have to reallocate the buffer
> > will tail off over time.
>
> What should be the size of that array?

I think some small, power of 2 would be a reasonable choice here.

> > If we were really concerned about it, we could treat the buffer as a
> > static pointer and reuse it over time (making sure to clear out the
> > portions of it we're going to reuse, or otherwise ensuring that we don't
> > read old data). But I doubt it matters much either way in practice: the
> > individual records are small (at just 4 bytes each) and entry_count is
> > often less than 1,000, so I think this probably has a vanishingly small
> > impact.
>
> Before submitting it to the mailing list, I did use the ALLOC_GROW macro
> function. But my version was worse than yours. For every iteration I was
> reallocating the array to support `size+1` positions. But later I drop
> the code as this might be very much expensive.

That shouldn't be the case. When you have a chance, take a look at the
alloc_nr macro, which shows how much memory we allocate at each
step:

    #define alloc_nr(x) (((x)+16)*3/2)

Suppose we allocated 16 slots initially, so nr (the number of entries
stored in the list) is 0 and alloc (the number of entries allocated) is
16. Then when we try to add the 17th item, we'll pass 16 to alloc_nr
which will allocate 48 slots. Then 96, then 168, and so on.

We only have to reallocate and copy the array when nr > alloc, which
should be fairly infrequently, and happens less and less often the
larger the array grows.

Thanks,
Taylor