On Thu, Aug 8, 2019 at 11:15 AM Elijah Newren <newren@xxxxxxxxx> wrote: > > On Wed, Aug 7, 2019 at 9:11 PM Phil Hord <phil.hord@xxxxxxxxx> wrote: > > > > From: Phil Hord <phil.hord@xxxxxxxxx> > > > > 'git tag -d' accepts one or more tag refs to delete, but each deletion > > is done by calling `delete_ref` on each argv. This is painfully slow > > when removing from packed refs. Use delete_refs instead so all the > > removals can be done inside a single transaction with a single write. > > Nice, thanks for working on this. > > > I have a repo with 24,000 tags, most of which are not useful to any > > developers. Having this many refs slows down many operations that > > would otherwise be very fast. Removing these tags when they've been > > accidentally fetched again takes about 30 minutes using delete_ref. > > I also get really slow times on a repo with ~20,000 tags (though order > ~3 minutes rather than ~30, probably due to having an SSD on this > machine) -- but ONLY IF the refs are packed first (git pack-refs > --all). If the refs are loose, it's relatively quick to delete a > dozen thousand or so tags (order of a few seconds). It might be worth > mentioning in the commit message that this only makes a significant > difference in the case where the refs are packed. I'm also using an SSD but I still see about 10 tags per second being deleted with the current code (and packed-refs). I see that I'm CPU-bound, so I guess most of the time is spent searching through .git/packed-refs. Probably it will run faster as it progresses. I guess the 18,000 branches in my repo keep me on the wrong end of O(N). My VM is on an all-flash storage array, but I can't say much about its write throughput since it's one VM among many. Previously I thought I saw a significant speedup between v2.7.4 (on my development vm) and v2.22.0 (on my laptop). But this week I saw it was slow again on my laptop. I looked for the regression but didn't find anyone touching that code. Then I wrote this patch. But it should have occurred to me while I was in the code that there is a different path for unpacked refs which could explain my previous speeds. I didn't think I had any unpacked refs, though, since every time I look in .git/refs for what I want, I find it relatively empty. I see 'git pack-refs --help' says that new refs should show up loose, but I can't say that has happened for me. Maybe a new clone uses packed-refs for *everything* and only newly fetched things are loose. Is that it? I guess since I seldom fetch tags after the first clone, it makes sense they would all be packed. > > git tag -l feature/* | xargs git tag -d > > > > Removing the same tags using delete_refs takes less than 5 seconds. > > It appears this same bug also affects `git branch -d` when deleting > lots of branches (or remote tracking branches) and they are all > packed; could you apply the same fix there? Will do. > In constrast, it appears that `git update-ref --stdin` is fast > regardless of whether the refs are packed, e.g. > git tag -l feature/* | sed -e 's%^%delete refs/tags/%' | git > update-ref --stdin > finishes quickly (order of a few seconds). Nice! That trick is going in my wiki for devs to use on their VMs. Thanks for that.