Re: [PATCH] send-pack: use OBJECT_INFO_QUICK to check negative objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Jeff King wrote:

> Subject: [PATCH] send-pack: use OBJECT_INFO_QUICK to check negative objects
>
> When pushing, we feed pack-objects a list of both positive and negative
> objects. The positive objects are what we want to send, and the negative
> objects are what the other side told us they have, which we can use to
> limit the size of the push.
>
> Before passing along a negative object, send_pack() will make sure we
> actually have it (since we only know about it because the remote
> mentioned it, not because it's one of our refs). So it's expected that
> some of these objects will be missing on the local side. But looking for
> a missing object is more expensive than one that we have: it triggers
> reprepare_packed_git() to handle a racy repack, plus it has to explore
> every alternate's loose object tree (which can be slow if you have a lot
> of them, or have a high-latency filesystem).

Nice analysis.

> This isn't usually a big problem, since repositories you're pushing to
> don't generally have a large number of refs that are unrelated to what
> the client has. But there's no reason such a setup is wrong, and it
> currently performs poorly.
>
> We can fix this by using OBJECT_INFO_QUICK, which tells the lookup
> code that we expect objects to be missing. Notably, it will not re-scan
> the packs, and it will use the loose cache from 61c7711cfe (sha1-file:
> use loose object cache for quick existence check, 2018-11-12).

On first reading, I wondered how this would interact with alternates,
since you had mentioned that checking alternates can be expensive.  Does
this go too far in that direction by treating an object as missing
whenever it's not in the local object store, even if it's available from
an alternate?

But I believe that was a misreading.  With this patch, we still do pay
the cost of checking alternates for the missing object.  The savings
is instead about having to *double* check.

Am I understanding correctly?

[...]
> Signed-off-by: Jeff King <peff@xxxxxxxx>

Reviewed-by: Jonathan Nieder <jrnieder@xxxxxxxxx>

> ---
> Interestingly, upload-pack does not use OBJECT_INFO_QUICK when it's
> getting oids from the other side. But I think it could possibly benefit
> in the same way. Nobody seems to have noticed. Perhaps it simply comes
> up less, as servers would tend to have more objects than their clients?

I like to imagine that servers are also more likely to keep a tidy set
of packs and to avoid alternates.  But using INFO_QUICK when checking
the fetcher's "have"s does sound like a sensible change to me.

Thanks,
Jonathan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux