Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Elijah,

On Fri, 30 Aug 2019, Elijah Newren wrote:

> On Fri, Aug 30, 2019 at 1:40 PM Johannes Schindelin
> <Johannes.Schindelin@xxxxxx> wrote:
>
> > [...]
> > In my most recent instance of this, I wanted to publish the script I
> > used to use for submitting patch series to the Git mailing list,
> > maintaining tags for iterations and generating cover letters from branch
> > descriptions and interdiffs (this script eventually became GitGitGadget,
> > https://github.com/gitgitgadget/gitgitgadget/commits?after=6fb0ede48f86e729292ee1542729bc0f5a30cfa6+0
> > demonstrates this).
> >
> > To do that, I ran a `git filter-branch` in the repository where I track
> > all the scripts I deem unsuitable for public consumption, to remove all
> > files but `mail-patch-series.sh`, then pushed it to
> > https://github.com/dscho/mail-patch-series
> >
> > Please note that most crucially, I wanted to rewrite a newly-created
> > branch, and only that branch.
> >
> > Could I have done the same using `git fast-export`, filtering the output
> > with a Perl script, then passing it to `git fast-import`? Sure, I was
> > really tempted to do that. In the end, it took less of _my_ time to just
> > let `git filter-branch` do its work with a not-too-complicated index
> > filter.
>
> Why a perl script?  Shouldn't
>     git fast-export [--no-data] HEAD -- $PATH | git fast-import --force --quiet
> do the trick?  And it's probably simpler and shorter than the index
> filter you used.

Does that not keep the full `$PATH`? I wanted the resulting branch to
have the file in the top-level directory.

> That said, yeah it'd be nice to get automatic rewriting of commit
> hashes in commit messages and other niceties from filter-repo (e.g.
> future automatic reattaching of notes to the rewritten commits).  Some
> questions:
>
>   * What's the backup strategy in case you specify the wrong filters
> (e.g. you have a typo in the pathnames)?  filter-repo encourages folks
> to make a clone and then filter the fresh clone, because if anything
> goes awry, you can just delete and restart.  (I am heavily opposed to
> the refs/original/ backup mechanism used by filter-branch, for
> multiple reasons.)  Is your safety stance just "If I mess up it's my
> own fault; do the rewrite?"  Or are you okay with cloning before
> filtering?

Please note that the `refs/original/` refs should not have been written
at all anymore, not after reflogs were introduced.

Incidentally, that is my answer to your question: the reflog is my
backup.

>   * If you're okay with cloning before filtering...then is there an
> issue with rewriting all branches, and just pushing the one you need?
> (Is there an issue with "this branch is small, the others are huge,
> and filter-branch is slow -- so rewriting one branch saves me lots of
> time"?  Or are there other issues at play too?)

I am not okay with cloning before filtering.

First of all, it is wasteful.

Second of all, in my case it would have been *particularly* wasteful
because the repository in question also has quite a few quite large
blobs (hysterical raisins, don't ask).

>   * What if the user has auxiliary information for the branch in other
> refs?  For example, git-notes pointing at any of the commits, or tags
> in the history of the branch that might be relevant, or perhaps even
> replace refs in combination with GIT_NO_REPLACE_OBJECTS=1?  Is this an
> "I don't care, toss that stuff and just rewrite just this branch?"

In my case: there are no notes. The only time when I make heavy use of
notes is in GitGitGadget. I don't use that feature otherwise.

>   * filter-repo by default creates new replace references so that you
> can refer to new commit IDs using old (unabbreviated) commit IDs.
> Would that be considered helpful for this usecase?  unhelpful?
> irrelevant, since you'll just push the branch you want somewhere and
> nuke the temporary clone?

I definitely did not need that mapping in all of my `git filter-branch`
use cases.

Of course, I can see how it can come in handy in other circumstances,
just not in the ones I experienced so far.

> I'm not by any means ruling out the possibility of documenting --refs
> and adjusting the defaults when it is used so the user can just run
> something like
>    git filter-repo --path $PATH --refs $MYBRANCH
> but I feel like I need to understand answers to questions like the
> above ones so that I can know how to phrase warnings and adjust
> defaults and update the documentation.

In all the scenarios where I used `git filter-branch` (some dozen per
year, so not all *that* many), I needed to rewrite one particular
branch, typically a freshly-created one. I never, ever ever needed to
rewrite all the refs in the repository. Not once ;-)

> > In another instance, a long, long time ago, I needed to restart a
> > repository which had included way too many files for its own good, then
> > rename the old repository and start with a fresh `master` that contained
> > but a single commit whose tree was identical to the previous `master`'s
> > tip commit. I simply grafted that commit, ran `git filter-branch` and
> > had precisely what I needed.
>
> filter-repo supports grafts and replace objects, the same as
> filter-branch.  (Although, technically, I didn't have to do a thing to
> support it; fast-export does the special handling of rewriting based
> on grafts and replace objects.)  So, I'd say this is fully supported.
>
> Side question: the git-replace documents suggest that the graft file
> is deprecated.  Are there any timeframes or plans for phasing out
> beyond the git-replace manpage existing?  Should I avoid documenting
> the graft file support in filter-repo?  Should I include examples
> using not just git-replace but also using the graft file?

I had meant to prepare a patch series to remove `grafts` support that
Junio could carry in `pu` until the time he considers it appropriate to
merge to `master`, but it seems that this task fell under the rag.

The deprecation itself has been introduced in tags/v2.18.0-rc0~54^2~4,
i.e. it is official as of Git v2.18.0, which was released in mid-June
last year.

My personal gut feeling is that we should let it simmer for another year
before removing support for the `grafts` file (and we may want to update
the label "grafted" when `git log` shows a shallow commit before we
remove that support for `grafts`).

So I'll not work on that patch for now.

> > I would be _delighted_ if these kinds of use case (rewriting a branch,
> > or even just a commit range) became more of a first-class citizen with
> > `git filter-repo`.
>
> I've got all the pieces for supporting a single branch or a commit
> range (e.g. 'git filter-repo --path foo --refs ^master~4 ^stable~23
> mybranch'), but the defaults (error out unless in a bare repo, move
> refs/remotes/origin/* to refs/heads/*, disconnect origin remote,
> expire reflogs & repack & prune, create new replace references so
> folks can access new commits using old commit IDs) may be somewhat
> friction-filled for this usecase.  Those defaults other than the new
> replace refs happen to all be turned off with the combination of
> --force and --target, so, assuming turning them off is what you need,
> you could cheat and just specify 'git filter-repo --force --target .
> --refs $MYBRANCH' today and perhaps get what you want, but that's a
> really non-intuitive command line that is way too ugly to recommend.
> And I don't want to tie myself to '--target .' being the magic sauce
> in the future either.

I agree. I would love for my use cases to become more of first-class
citizens. Maybe `--branch <branch>` could serve as the knob?

What I also found really helpful in `git filter-branch` is that it was
possible to pass one-liner shell scripts directly to the command, giving
a lot of freedom about the transformations. I understand that Python
makes it hard to write spaghetti-code one-liners, so you cannot really
pass the snippet in via the command-line, but I hope there is a way to
script things in `git filter-repo`?

Ciao,
Dscho




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux