Re: [PATCH] mktree: learn about promised objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/16/2022 2:07 AM, Jeff King wrote:
> On Wed, Jun 15, 2022 at 02:17:58PM -0400, Derrick Stolee wrote:
> 
>> On 6/15/2022 1:40 PM, Richard Oliver wrote:
>>> On 15/06/2022 05:00, Jeff King wrote:
>>
>>>> So it is not just lookup, but actual tree walking that is expensive. The
>>>> flip side is that you don't have to store a complete separate list of
>>>> the promised objects. Whether that's a win depends on how many local
>>>> objects you have, versus how many are promised.
>>
>> This is also why blobless (or blob-size filters) are the recommended way
>> to use partial clone. It's just too expensive to have tree misses.
> 
> I agree that tree misses are awful, but I'm actually talking about
> something different: traversing the local trees we _do_ have in order to
> find the set of promised objects. Which is worse for blob:none, because
> it means you have more trees locally. :)

Ah, I misread your email. I agree that walking trees is far too
expensive to do just to find an object type.

> Try this with a big repo like linux.git:
> 
>   git clone --no-local --filter=blob:none linux.git repo
>   cd repo
> 
>   # this is fast; we mark the promisor trees as UNINTERESTING, so we do
>   # not look at them as part of the traversal, and never call
>   # is_promisor_object().
>   time git rev-list --count --objects --all --exclude-promisor-objects
> 
>   # but imagine we had a fixed mktree[1] that did not fault in the blobs
>   # unnecessarily, and we made a new tree that references a promised
>   # blob.
>   tree=$(git ls-tree HEAD~1000 | grep Makefile | git mktree --missing)
>   commit=$(echo foo | git commit-tree -p HEAD $tree)
>   git update-ref refs/heads/foo $commit
> 
>   # this is now slow; even though we only call is_promisor_object()
>   # once, we have to open every single tree in the pack to find it!
>   time git rev-list --count --objects --all --exclude-promisor-objects
> 
> Those rev-lists run in 1.7s and 224s respectively. Ouch!

This is exactly the reason I thought just asking for the objects
directly is faster than scanning all the packs. Thanks for giving
concrete numbers that support that assumption.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux