Re: [PATCH] midx: traverse the local MIDX first

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 28, 2020 at 02:50:39PM -0400, Jeff King wrote:
> On Fri, Aug 28, 2020 at 02:06:21PM -0400, Taylor Blau wrote:
> [ snip ]
>
> > This patch bandaids that situation by trying to place the local MIDX
> > first in the chain when calling 'prepare_multi_pack_index_one()'.
> > Specifically, it always inserts all MIDXs loads subsequent to the local
> > one as the head of the tail of the MIDX chain. This makes it so that we
> > don't have a quadratic insertion while still keeping the local MIDX at
> > the front of the list. Likewise, it avoids an 'O(m*n)' lookup in
> > 'remove_redundant_pack()' where 'm' is the number of redundant packs and
> > 'n' is the number of alternates.
> >
> > Protect 'remove_redundant_pack()' against a case where alternates with
> > MIDXs exist, but no local MIDX exists by first checking that 'm->local'
> > before asking whether or not it contains the requested pack.
>
> It seems like the root of the problem is that get_multi_pack_index() is
> being used for two different things:
>
>   - most callers want to iterate over all of the possible midx files,
>     because they're trying to look up an object.
>
>   - this caller wants the single midx for the local repository (I
>     wondered if there might be more of these that we just never noticed
>     because the test suite doesn't use alternates, but having just
>     audited them all, the answer is no).
>
> So I'd be tempted to say that the latter callers should be using a
> separate function that gives them what they want. That lets them avoid
> being too intimate with the details of how we order things.
>
> The patch below illustrates that.  It also changes the existing function
> name to avoid confusion and to help audit the existing callers, but
> that's optional and maybe not worth it.
>
> It does do the linear lookup, so it has the O(m*n) you mentioned. But
> the number of alternates is generally small, and I'd be shocked if this
> was the first m*n loop over the number of alternates. However, we could
> still do the ordering thing and just keep the details inside the new
> function.

I'd be much happier with this patch.

If you wanted to go further, we could do both, and tighten up the
insertion code to make sure that the local MIDX is always first from the
perspective of 'r->objects->multi_pack_index' so that the linear lookup
drops to constant.

But, it might be overkill, since I also tend to think that the number of
alternates is small, and we're not even talking about a difference of
milliseconds here.

So, I'm happy to clean it up for you and forge your sign-off with
permission, or otherwise you're welcome to do so, too.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux