On 9/22/2020 7:09 PM, Jonathan Tan wrote: >> Create a 'loose-objects' task for the 'git maintenance run' command. >> This helps clean up loose objects without disrupting concurrent Git >> commands using the following sequence of events: >> >> 1. Run 'git prune-packed' to delete any loose objects that exist >> in a pack-file. Concurrent commands will prefer the packed >> version of the object to the loose version. (Of course, there >> are exceptions for commands that specifically care about the >> location of an object. These are rare for a user to run on >> purpose, and we hope a user that has selected background >> maintenance will not be trying to do foreground maintenance.) >> >> 2. Run 'git pack-objects' on a batch of loose objects. These >> objects are grouped by scanning the loose object directories in >> lexicographic order until listing all loose objects -or- >> reaching 50,000 objects. This is more than enough if the loose >> objects are created only by a user doing normal development. >> We noticed users with _millions_ of loose objects because VFS >> for Git downloads blobs on-demand when a file read operation >> requires populating a virtual file. > > [snip] > >> This has potential of >> happening in partial clones if someone runs 'git grep' or >> otherwise evades the batch-download feature for requesting >> promisor objects. > > This part is not strictly true, as even when Git lazy-fetches one > object, it fetches it in the form of a packfile - so maybe remove this > sentence. This is a good point. I just did some testing and we do store these single-object downloads as pack-files. My misunderstanding is due to my own bias and experience with the GVFS protocol. I have also heard that "git fetch" might explode some small pack-files into loose objects, and I guess I expected the same here. However, that is not the case for partial clone. I'll remove this. > This is nevertheless a good feature to have - loose objects may not be > created during lazy fetches, but they definitely are created during > normal operation (e.g. commits). Git, as a whole, prefers packfiles over > loose objects, and just packing the loose objects themselves instead of > running repack (which goes through all reachable objects) is definitely > better for large repositories. Thanks, -Stolee