In my day-to-day work, I have the occasion to use GitHub Codespaces on a repository with about 20,000 refs on the server. The environment is set up to pre-clone the repository, but I use a different default remote name than "origin" ("def", to be particular), and thus, one of the things I do when I set up that environment is to run "git remote rename origin def". This process takes 35 minutes, which is extremely pathological. I believe what's happening is that all of the refs are packed, and renaming the ref causes a loose ref to be created and the old ref to be deleted (necessitating a rewrite of the packed-refs file). This is essentially O(N^2) in the order of refs. We recently added a --progress option, but I think this performance is bad enough that that's not going to suffice here, and we should try to do better. I found that using "git for-each-ref" and "git update-ref --stdin" in a pipeline to create and delete the refs as a single transaction takes a little over 2 seconds. This is greater than a 99.9% improvement and is much more along the line of what I'd expect. I thought about porting this code to use a ref transaction, but I realized that we don't rename reflogs in that situation, which might be a problem for some people. In my case, since it's a freshly cloned repo and the reflogs aren't interesting, I don't care. I think a possible way forward may be to either teach ref transactions about ref renames, or simply to add a --no-reflogs option, which omits the reflogs in case the user doesn't care. I'm interested to hear ideas from others, though, about the best way forward. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature