Re: [PATCH v2 07/19] midx: introduce `bsearch_one_midx()`

Taylor Blau <me@xxxxxxxxxxxx> · Thu, 1 Aug 2024 15:54:34 -0400

On Thu, Aug 01, 2024 at 06:06:35AM -0400, Jeff King wrote:
> One thing that confused me, though, is setting "num". From the "-w"
> diff:
>
>   -       num = m->num_objects;
>   +
>   +               num = m->num_objects + m->num_objects_in_base;
>
>                   if (!num)
>   -               return;
>   +                       continue;
>
> Before we only had one midx, so that was our limit. But now we are
> looking at "num" as a limit in the global size of the chained midx.
> Which feels weird, since we're just considering a single layer here. We
> seem to use "num" in two ways:
>
>   - we return if it's 0 (or now continue to the next layer). But
>     wouldn't we want to do that per-layer? I don't think it will produce
>     wrong answers, but we're less likely to kick in this early return
>     (though it's not clear to me when it would ever kick in really; a
>     zero-length midx?).

This is definitely a bug. We should certainly do something like:

    for (; m; m = m->base_midx) {
            uint32_t num;
            if (!m->num_objects)
                    continue;

            num = m->num_objects + m->num_objects_in_base;
            /* ... */
    }

I'll go ahead and fix this one up locally, which is easy enough to do.

>     So I think it's correct, though it feels like bsearch_one_midx()
>     should still return the position within that midx (and then
>     bsearch_midx() could add back m->num_objects_in_base to get a global
>     position). And then I guess likewise there would need to be a
>     midx-local version of nth_midxed_object_oid().

Like many of the other changes in this series, it's really a matter of
where you put the complexity: either it's in the callers or in the
function itself.

I think here I prefer having bsearch_one_midx() return the global
position, since it is directly usable in other top-level functions
within the MIDX API, like being able to pass it directly to
nth_midxed_object_oid() and etc. below.

Thanks,
Taylor