Shubham Kanodia <shubham.kanodia10@xxxxxxxxx> writes: >> ... >> Thanks. [administrivia: respond inline, trim out parts that do not have to be read by bystanders to understand your response]. >> In any case, stepping back a bit, for the population of user who >> benefit from enabing the prune-remote-refs task, wouldn't it be an >> even better solution for them to set fetch.prune? You can tell them >> to run "git remote prune" just once, set that configuration >> variable, and then the remote-tracking branches will stay clean from >> then on. Any future interactions with the remote make sure stale >> remote-tracking branches will be removed automatically. Wouldn't >> that be a much better option? I am sure I must be missing a use >> case where fetch.prune (or remote.<name>.prune) is not a good idea >> but background prune-remote-refs task works better. > > Let me expand on the context for suggesting this change: > > I work with a large repository that has over 50k refs, with about 4k > new ones added weekly. > We have maintenance scripts on our git server that clean up stale refs > (unused older than N months). > Using `fetch.prune` with a normal git fetch isn't ideal because it > would cause git fetch to unnecessarily download many new refs that > users don't need. So we actively discourage that. This is what I did not quite understand. What do your users normally do to update their repository from the remote to become in sync, if they are not running "git fetch"? Side note: it is very likely that your users are not directly be running "git fetch", but using various front-ends like "git pull", "git pull --rebase", or even "repo", but they all at some point call "git fetch" to get the new objects and update refs. Ah, are they using "git fetch origin +foo:refs/remotes/origin/foo", i.e., only selectively fetch the thing that they use and nothing else (again, their wrappers may supply the refspec to do the limiting)? Now it slowly starts to make sense to me (sorry, I am slow, especially without caffeine in the morning). Am I following / guessing your set-up more or less correctly so far? In any case, if your users are doing selective fetching, 50k refs or 4k ref turnover per week on the other side does not really matter. Your users' desktop repositories won't see remote-tracking refs that they didn't use and ask for. But you are right that these selectively fetched refs will accumulate unless pruned, and fetch.prune would not prune anything when git fetch origin +foo:refs/remotes/origin/foo because it will not prune what is outside the hierarchy the refspec covers and this is a deliberate design decision. For "git fetch origin '+refs/heads/*:refs/remotes/origin/*'", which is pretty much how "git clone" sets up the remotes, anything we have in refs/remotes/origin/ hierarchy that do not appear in the current refs/heads/ hiearchy they have are pruned with fetch.prune=true. But if you fetch selectively, either 'foo' exists (in which case it won't be pruned), or 'foo' went away (in which case the fetch itself fails before even pruning what is on our end), so fetch.prune may not help. And at least for a shorter term, periodically running "remote prune" would be an acceptable workaround for such a workflow. In the longer term, I suspect that we may want a new option that lets you more aggressively prune your remote-tracking refs, telling the tool something like git fetch --prune-aggressive origin +refs/heads/foo:refs/remotes/origin/foo to mean "I only am interested in getting the object to complete their current 'foo' branch, and get my remote-tracking ref for that branch updated, BUT if you notice some ref in my refs/remotes/origin/* that they do not have in their refs/heads/*, please prune it, even when they are not 'foo' (which means normal --prune would not prune them)", would not be a terrible idea. It would be more involved than running "remote prune" periodically, of course. > In theory, users could just run `git remote prune` once and carefully > avoid full fetches to keep their local ref count low. > However, in practice, we've found that full fetches happen through > various indirect means: > > - Shell plugins like zsh/pure > - Git GUIs like Sourcetree > - Code editors like VSCode > > among others. And do any of these bypass underlying "git fetch"? If not, then one easier solution is to accept that somebody will do the regular refs/heads/*:refs/remotes/origin/* full-fetch *anyway*. Once we accept that to happen, we can tell "git fetch" to always prune. Then when these "various indirect means" attempt their full fetch, "git fetch" invoked by them would still honor fetch.prune, so even though they would try to maintain these 50k refs in-sync with the remote, which means you may see 4k new refs per week, but plausibly the refs that got retired are seen as stale and get removed on the users' repositories. > - If full `git fetch` is completely avoided, this will gradually > reduce the local ref count from tens of thousands to just a few > hundred active refs (even if the remote has 50k+ active refs) as old > branches on the remote expire with time. Yes. You'd somehow need to arrange these third-party tools not to fetch too much unneeded cruft. > - Even if not —say, if an errant tool or the developer executes `git > fetch` mistakenly, then the maintenance job ensures this doesn't > become their permanent state until the next manual remote prune. For the latter case, fetch.prune=true would be the ideal solution, I would imagine. The "errant tool"'s 'git fetch' would prune the stale ones. Now, the documentation should explain when this "periodically running remote prune" is an acceptable workaround and/or a useful solution, relative to setting fetch.prune, as most parts of the existing documentation do assume that the users, intended audience of the document, are using the bog-standard "git clone" result, that copies all their branches to remote-tracking branches. Thanks.