On Fri, Oct 06, 2023 at 03:35:25PM -0700, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > When using merge-tree often within a repository[^1], it is possible to > > generate a relatively large number of loose objects, which can result in > > degraded performance, and inode exhaustion in extreme cases. > > Well, be it "git merge-tree" or "git merge", new loose objects tend > to accumulate until "gc" kicks in, so it is not a new problem for > mere mortals, is it? Yeah, I would definitely suspect that this is more of an issue for forges than individual Git users. > As one "interesting" use case of "merge-tree" is for a Git hosting > site with bare repositories to offer trial merges, without which > majority of the object their repositories acquire would have been in > packs pushed by their users, "Gee, loose objects consume many inodes > in exchange for easier selective pruning" becomes an issue, right? Right. > Just like it hurts performance to have too many loose object files, > presumably it would also hurt performance to keep too many packs, > each came from such a trial merge. Do we have a "gc" story offered > for these packs created by the new feature? E.g., "once merge-tree > is done creating a trial merge, we can discard the objects created > in the pack, because we never expose new objects in the pack to the > outside, processes running simultaneously, so instead closing the > new packfile by calling flush_bulk_checkin_packfile(), we can safely > unlink the temporary pack. We do not even need to spend cycles to > run a gc that requires cycles to enumerate what is still reachable", > or something like that? I know Johannes worked on something like this recently. IIRC, it effectively does something like: struct tmp_objdir *tmp_objdir = tmp_objdir_create(...); tmp_objdir_replace_primary_odb(tmp_objdir, 1); at the beginning of a merge operation, and: tmp_objdir_discard_objects(tmp_objdir); at the end. I haven't followed that work off-list very closely, but it is only possible for GitHub to discard certain niche kinds of merges/rebases, since in general we make the objects created during test merges available via refs/pull/N/{merge,rebase}. I think that like anything, this is a trade-off. Having lots of packs can be a performance hindrance just like having lots of loose objects. But since we can represent more objects with fewer inodes when packed, storing those objects together in a pack is preferable when (a) you're doing lots of test-merges, and (b) you want to keep those objects around, e.g., because they are reachable. Thanks, Taylor