Re: [PATCH v3 2/8] maintenance: add loose-objects task

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/22/2020 7:09 PM, Jonathan Tan wrote:
>> Create a 'loose-objects' task for the 'git maintenance run' command.
>> This helps clean up loose objects without disrupting concurrent Git
>> commands using the following sequence of events:
>>
>> 1. Run 'git prune-packed' to delete any loose objects that exist
>>    in a pack-file. Concurrent commands will prefer the packed
>>    version of the object to the loose version. (Of course, there
>>    are exceptions for commands that specifically care about the
>>    location of an object. These are rare for a user to run on
>>    purpose, and we hope a user that has selected background
>>    maintenance will not be trying to do foreground maintenance.)
>>
>> 2. Run 'git pack-objects' on a batch of loose objects. These
>>    objects are grouped by scanning the loose object directories in
>>    lexicographic order until listing all loose objects -or-
>>    reaching 50,000 objects. This is more than enough if the loose
>>    objects are created only by a user doing normal development.
>>    We noticed users with _millions_ of loose objects because VFS
>>    for Git downloads blobs on-demand when a file read operation
>>    requires populating a virtual file. 
> 
> [snip]
> 
>> This has potential of
>>    happening in partial clones if someone runs 'git grep' or
>>    otherwise evades the batch-download feature for requesting
>>    promisor objects.
> 
> This part is not strictly true, as even when Git lazy-fetches one
> object, it fetches it in the form of a packfile - so maybe remove this
> sentence.

This is a good point. I just did some testing and we do store these
single-object downloads as pack-files. My misunderstanding is due to
my own bias and experience with the GVFS protocol.

I have also heard that "git fetch" might explode some small pack-files
into loose objects, and I guess I expected the same here. However, that
is not the case for partial clone. I'll remove this.

> This is nevertheless a good feature to have - loose objects may not be
> created during lazy fetches, but they definitely are created during
> normal operation (e.g. commits). Git, as a whole, prefers packfiles over
> loose objects, and just packing the loose objects themselves instead of
> running repack (which goes through all reachable objects) is definitely
> better for large repositories.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux