Hi Elijah, On Fri, 30 Aug 2019, Elijah Newren wrote: > On Fri, Aug 30, 2019 at 1:40 PM Johannes Schindelin > <Johannes.Schindelin@xxxxxx> wrote: > > > [...] > > In my most recent instance of this, I wanted to publish the script I > > used to use for submitting patch series to the Git mailing list, > > maintaining tags for iterations and generating cover letters from branch > > descriptions and interdiffs (this script eventually became GitGitGadget, > > https://github.com/gitgitgadget/gitgitgadget/commits?after=6fb0ede48f86e729292ee1542729bc0f5a30cfa6+0 > > demonstrates this). > > > > To do that, I ran a `git filter-branch` in the repository where I track > > all the scripts I deem unsuitable for public consumption, to remove all > > files but `mail-patch-series.sh`, then pushed it to > > https://github.com/dscho/mail-patch-series > > > > Please note that most crucially, I wanted to rewrite a newly-created > > branch, and only that branch. > > > > Could I have done the same using `git fast-export`, filtering the output > > with a Perl script, then passing it to `git fast-import`? Sure, I was > > really tempted to do that. In the end, it took less of _my_ time to just > > let `git filter-branch` do its work with a not-too-complicated index > > filter. > > Why a perl script? Shouldn't > git fast-export [--no-data] HEAD -- $PATH | git fast-import --force --quiet > do the trick? And it's probably simpler and shorter than the index > filter you used. Does that not keep the full `$PATH`? I wanted the resulting branch to have the file in the top-level directory. > That said, yeah it'd be nice to get automatic rewriting of commit > hashes in commit messages and other niceties from filter-repo (e.g. > future automatic reattaching of notes to the rewritten commits). Some > questions: > > * What's the backup strategy in case you specify the wrong filters > (e.g. you have a typo in the pathnames)? filter-repo encourages folks > to make a clone and then filter the fresh clone, because if anything > goes awry, you can just delete and restart. (I am heavily opposed to > the refs/original/ backup mechanism used by filter-branch, for > multiple reasons.) Is your safety stance just "If I mess up it's my > own fault; do the rewrite?" Or are you okay with cloning before > filtering? Please note that the `refs/original/` refs should not have been written at all anymore, not after reflogs were introduced. Incidentally, that is my answer to your question: the reflog is my backup. > * If you're okay with cloning before filtering...then is there an > issue with rewriting all branches, and just pushing the one you need? > (Is there an issue with "this branch is small, the others are huge, > and filter-branch is slow -- so rewriting one branch saves me lots of > time"? Or are there other issues at play too?) I am not okay with cloning before filtering. First of all, it is wasteful. Second of all, in my case it would have been *particularly* wasteful because the repository in question also has quite a few quite large blobs (hysterical raisins, don't ask). > * What if the user has auxiliary information for the branch in other > refs? For example, git-notes pointing at any of the commits, or tags > in the history of the branch that might be relevant, or perhaps even > replace refs in combination with GIT_NO_REPLACE_OBJECTS=1? Is this an > "I don't care, toss that stuff and just rewrite just this branch?" In my case: there are no notes. The only time when I make heavy use of notes is in GitGitGadget. I don't use that feature otherwise. > * filter-repo by default creates new replace references so that you > can refer to new commit IDs using old (unabbreviated) commit IDs. > Would that be considered helpful for this usecase? unhelpful? > irrelevant, since you'll just push the branch you want somewhere and > nuke the temporary clone? I definitely did not need that mapping in all of my `git filter-branch` use cases. Of course, I can see how it can come in handy in other circumstances, just not in the ones I experienced so far. > I'm not by any means ruling out the possibility of documenting --refs > and adjusting the defaults when it is used so the user can just run > something like > git filter-repo --path $PATH --refs $MYBRANCH > but I feel like I need to understand answers to questions like the > above ones so that I can know how to phrase warnings and adjust > defaults and update the documentation. In all the scenarios where I used `git filter-branch` (some dozen per year, so not all *that* many), I needed to rewrite one particular branch, typically a freshly-created one. I never, ever ever needed to rewrite all the refs in the repository. Not once ;-) > > In another instance, a long, long time ago, I needed to restart a > > repository which had included way too many files for its own good, then > > rename the old repository and start with a fresh `master` that contained > > but a single commit whose tree was identical to the previous `master`'s > > tip commit. I simply grafted that commit, ran `git filter-branch` and > > had precisely what I needed. > > filter-repo supports grafts and replace objects, the same as > filter-branch. (Although, technically, I didn't have to do a thing to > support it; fast-export does the special handling of rewriting based > on grafts and replace objects.) So, I'd say this is fully supported. > > Side question: the git-replace documents suggest that the graft file > is deprecated. Are there any timeframes or plans for phasing out > beyond the git-replace manpage existing? Should I avoid documenting > the graft file support in filter-repo? Should I include examples > using not just git-replace but also using the graft file? I had meant to prepare a patch series to remove `grafts` support that Junio could carry in `pu` until the time he considers it appropriate to merge to `master`, but it seems that this task fell under the rag. The deprecation itself has been introduced in tags/v2.18.0-rc0~54^2~4, i.e. it is official as of Git v2.18.0, which was released in mid-June last year. My personal gut feeling is that we should let it simmer for another year before removing support for the `grafts` file (and we may want to update the label "grafted" when `git log` shows a shallow commit before we remove that support for `grafts`). So I'll not work on that patch for now. > > I would be _delighted_ if these kinds of use case (rewriting a branch, > > or even just a commit range) became more of a first-class citizen with > > `git filter-repo`. > > I've got all the pieces for supporting a single branch or a commit > range (e.g. 'git filter-repo --path foo --refs ^master~4 ^stable~23 > mybranch'), but the defaults (error out unless in a bare repo, move > refs/remotes/origin/* to refs/heads/*, disconnect origin remote, > expire reflogs & repack & prune, create new replace references so > folks can access new commits using old commit IDs) may be somewhat > friction-filled for this usecase. Those defaults other than the new > replace refs happen to all be turned off with the combination of > --force and --target, so, assuming turning them off is what you need, > you could cheat and just specify 'git filter-repo --force --target . > --refs $MYBRANCH' today and perhaps get what you want, but that's a > really non-intuitive command line that is way too ugly to recommend. > And I don't want to tie myself to '--target .' being the magic sauce > in the future either. I agree. I would love for my use cases to become more of first-class citizens. Maybe `--branch <branch>` could serve as the knob? What I also found really helpful in `git filter-branch` is that it was possible to pass one-liner shell scripts directly to the command, giving a lot of freedom about the transformations. I understand that Python makes it hard to write spaghetti-code one-liners, so you cannot really pass the snippet in via the command-line, but I hope there is a way to script things in `git filter-repo`? Ciao, Dscho