On Wed, Apr 13 2022, brian m. carlson wrote: > [[PGP Signed Part:Undecided]] > In my day-to-day work, I have the occasion to use GitHub Codespaces on a > repository with about 20,000 refs on the server. The environment is set > up to pre-clone the repository, but I use a different default remote > name than "origin" ("def", to be particular), and thus, one of the things > I do when I set up that environment is to run "git remote rename origin > def". Aside from how we'd do renames with transactions, do you know about clone.defaultRemoteName and --origin? > This process takes 35 minutes, which is extremely pathological. I > believe what's happening is that all of the refs are packed, and > renaming the ref causes a loose ref to be created and the old ref to be > deleted (necessitating a rewrite of the packed-refs file). This is > essentially O(N^2) in the order of refs. > > We recently added a --progress option, but I think this performance is > bad enough that that's not going to suffice here, and we should try to > do better. > > I found that using "git for-each-ref" and "git update-ref --stdin" in a > pipeline to create and delete the refs as a single transaction takes a > little over 2 seconds. This is greater than a 99.9% improvement and is > much more along the line of what I'd expect. > > I thought about porting this code to use a ref transaction, but I > realized that we don't rename reflogs in that situation, which might be > a problem for some people. In my case, since it's a freshly cloned repo > and the reflogs aren't interesting, I don't care. There was a (small) thread as a follow-up to that "rename --progress" patch at the time, did you spot/read that?: https://lore.kernel.org/git/220302.865yow6u8a.gmgdl@xxxxxxxxxxxxxxxxxxx/ There's doubtless other previous discussions, I just haven't found/remember them. I have (briefly) tried hacking on this myself in the past, as anyone who'll poke at that will no doubt find "branch rename" and "branch copy" non-ref-transaction way of doing this are basically other callers with the same problem. Before I go any further I think it's good to know how far down this particular rabbit hole you already are... > I think a possible way forward may be to either teach ref transactions > about ref renames, or simply to add a --no-reflogs option, which omits > the reflogs in case the user doesn't care. I'm interested to hear ideas > from others, though, about the best way forward. More generally, probably: 1. Teach transactions about N operations on the same refname, which they'll currently die on, renames are one case. 2. Be able to "hook in" to them, updating reflogs is one special-case, but we have the same inherent issue with updating config in lockstep with transactions.