Re: [RFC PATCH 0/5] Remove git-filter-branch from git.git; host it elsewhere

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dscho,

On Fri, Aug 30, 2019 at 1:40 PM Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
>
> Hi Elijah,
>
>
> On Wed, 28 Aug 2019, Elijah Newren wrote:
>
> > Hi Sergey,
> >
> > On Wed, Aug 28, 2019 at 1:52 AM Sergey Organov <sorganov@xxxxxxxxx> wrote:
> > >
> > > Elijah Newren <newren@xxxxxxxxx> writes:
> > >
> > > > On Tue, Aug 27, 2019 at 1:43 AM Sergey Organov <sorganov@xxxxxxxxx> wrote:
> > > >>
> > > >> Eric Wong <e@xxxxxxxxx> writes:
> > > >>
> > > >>
> > > >> [...]
> > > >>
> > > >> > AFAIK, filter-branch is not causing support headaches for any
> > > >> > git developers today.  With so many commands in git, it's
> > > >> > unlikely newbies will ever get around to discover it :)
> > > >> > So I think think we should be in any rush to remove it.
> > > >>
> > > >> Nah, discovering it is simple. Just Google for "git change author". That
> > > >> eventually leads to a script that uses "git filter-branch --env-filter"
> > > >> to get the job done, and I'm afraid it is spread all over the world.
> > > >>
> > > >> See, e.g.:
> > > >>
> > > >> https://help.github.com/en/articles/changing-author-info
> > > >
> > > > Side note: Is the goal to "fix names and email addresses in this
> > > > repository"?  If so, this guide fails: it doesn't update tagger names
> > > > or email addresses.  Indeed, filter-branch doesn't provide a way to do
> > > > that.  (Not to mention other problems like not updating references to
> > > > commit hashes in commit messages when it busy rewriting everything.)
> > >
> > > No. Maybe the original goal was like that, by I, personally, use
> > > modified version of this to change my "Author" credentials from
> > > "internal" to "public" in branches that I'm going to send upstream, so
> > > the actual aim is to change e-mail of particular Author from a@b to c@d
> > > in all the commits in a (feature) branch.
> >
> > There's an interesting usecase I hadn't heard of or thought of before.
>
> I'll throw in another use case that's kinda related: extracting the
> history of one file (or subdirectory).

Thanks for sending these along!  I do have some comments, and a bunch
of questions...

> In my most recent instance of this, I wanted to publish the script I
> used to use for submitting patch series to the Git mailing list,
> maintaining tags for iterations and generating cover letters from branch
> descriptions and interdiffs (this script eventually became GitGitGadget,
> https://github.com/gitgitgadget/gitgitgadget/commits?after=6fb0ede48f86e729292ee1542729bc0f5a30cfa6+0
> demonstrates this).
>
> To do that, I ran a `git filter-branch` in the repository where I track
> all the scripts I deem unsuitable for public consumption, to remove all
> files but `mail-patch-series.sh`, then pushed it to
> https://github.com/dscho/mail-patch-series
>
> Please note that most crucially, I wanted to rewrite a newly-created
> branch, and only that branch.
>
> Could I have done the same using `git fast-export`, filtering the output
> with a Perl script, then passing it to `git fast-import`? Sure, I was
> really tempted to do that. In the end, it took less of _my_ time to just
> let `git filter-branch` do its work with a not-too-complicated index
> filter.

Why a perl script?  Shouldn't
    git fast-export [--no-data] HEAD -- $PATH | git fast-import --force --quiet
do the trick?  And it's probably simpler and shorter than the index
filter you used.

That said, yeah it'd be nice to get automatic rewriting of commit
hashes in commit messages and other niceties from filter-repo (e.g.
future automatic reattaching of notes to the rewritten commits).  Some
questions:

  * What's the backup strategy in case you specify the wrong filters
(e.g. you have a typo in the pathnames)?  filter-repo encourages folks
to make a clone and then filter the fresh clone, because if anything
goes awry, you can just delete and restart.  (I am heavily opposed to
the refs/original/ backup mechanism used by filter-branch, for
multiple reasons.)  Is your safety stance just "If I mess up it's my
own fault; do the rewrite?"  Or are you okay with cloning before
filtering?
  * If you're okay with cloning before filtering...then is there an
issue with rewriting all branches, and just pushing the one you need?
(Is there an issue with "this branch is small, the others are huge,
and filter-branch is slow -- so rewriting one branch saves me lots of
time"?  Or are there other issues at play too?)
  * What if the user has auxiliary information for the branch in other
refs?  For example, git-notes pointing at any of the commits, or tags
in the history of the branch that might be relevant, or perhaps even
replace refs in combination with GIT_NO_REPLACE_OBJECTS=1?  Is this an
"I don't care, toss that stuff and just rewrite just this branch?"
  * filter-repo by default creates new replace references so that you
can refer to new commit IDs using old (unabbreviated) commit IDs.
Would that be considered helpful for this usecase?  unhelpful?
irrelevant, since you'll just push the branch you want somewhere and
nuke the temporary clone?


I'm not by any means ruling out the possibility of documenting --refs
and adjusting the defaults when it is used so the user can just run
something like
   git filter-repo --path $PATH --refs $MYBRANCH
but I feel like I need to understand answers to questions like the
above ones so that I can know how to phrase warnings and adjust
defaults and update the documentation.

> In another instance, a long, long time ago, I needed to restart a
> repository which had included way too many files for its own good, then
> rename the old repository and start with a fresh `master` that contained
> but a single commit whose tree was identical to the previous `master`'s
> tip commit. I simply grafted that commit, ran `git filter-branch` and
> had precisely what I needed.

filter-repo supports grafts and replace objects, the same as
filter-branch.  (Although, technically, I didn't have to do a thing to
support it; fast-export does the special handling of rewriting based
on grafts and replace objects.)  So, I'd say this is fully supported.

Side question: the git-replace documents suggest that the graft file
is deprecated.  Are there any timeframes or plans for phasing out
beyond the git-replace manpage existing?  Should I avoid documenting
the graft file support in filter-repo?  Should I include examples
using not just git-replace but also using the graft file?

> I would be _delighted_ if these kinds of use case (rewriting a branch,
> or even just a commit range) became more of a first-class citizen with
> `git filter-repo`.

I've got all the pieces for supporting a single branch or a commit
range (e.g. 'git filter-repo --path foo --refs ^master~4 ^stable~23
mybranch'), but the defaults (error out unless in a bare repo, move
refs/remotes/origin/* to refs/heads/*, disconnect origin remote,
expire reflogs & repack & prune, create new replace references so
folks can access new commits using old commit IDs) may be somewhat
friction-filled for this usecase.  Those defaults other than the new
replace refs happen to all be turned off with the combination of
--force and --target, so, assuming turning them off is what you need,
you could cheat and just specify 'git filter-repo --force --target .
--refs $MYBRANCH' today and perhaps get what you want, but that's a
really non-intuitive command line that is way too ugly to recommend.
And I don't want to tie myself to '--target .' being the magic sauce
in the future either.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux