Re: [PATCH 4/4] Add git-rewrite-commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, 9 Jul 2007, Sven Verdoolaege wrote:

> On Mon, Jul 09, 2007 at 12:56:04AM +0100, Johannes Schindelin wrote:
>
> > - you use a lot of what was in cg-admin-rewritehist (including 
> >   adjustments I made in the documentation for filter-branch), but you
> >   also make it more confusing for people used to that tool:
> > 
> > 	- instead of leaving the original branches as they are, you 
> > 	  overwrite them. That's okay. But then you put the originals into 
> > 	  refs/rewritten. Without cg-admin-rewritehist using that name 
> > 	  for the _result_, you could explain your way out of confusion. 
> > 	  As it is, you cannot.
> 
> I thought you had to specify the name of the new branch on the command
> line.  Anyway, I don't really care about the name of this hierarchy.
> I just picked a name that is somewhat related to "rewrite-commits".
> Suggestions are welcome.

How much more explicit should I formulate my suggestion?

> I could also just not create them.  The old values are also available in 
> reflog.

There are two types of convenience.  One for the programmer, and one for 
the user.  As a user I would definitely like to see which branches were 
actually changed.  So even if you _have_ to keep your current behaviour, 
namely that the results are stored under the original names, you should 
store the original refs in something like refs/original (overridable by a 
command line option), but _only if_ they actually changed.

> > 	- in spite of doing the same as cg-admin-rewritehist with filters, 
> > 	  you call them maps. But they are no maps. They are manipulators, 
> > 	  you can call them mutators or filters, too. Given what people 
> > 	  know of cg-admin-rewritehist, you really should keep the name 
> > 	  "filter".
> 
> Nonesense.

Quite a strong word.  Let's see if you can defend it:

> They are not filters 
> (http://en.wikipedia.org/wiki/Filter_%28higher-order_function%29). They 
> are maps (http://en.wikipedia.org/wiki/Map_%28higher-order_function%29).

Aha.  Functional programming.  Since when do we do functional programming 
here?  C++ is much closer to what we do.  So let's see:

http://en.wikipedia.org/wiki/Map_%28C%2B%2B_container%29

Does not apply at all here.  But does something else apply?  Ah, there is 
something about mathematics.  The basis of programming.  Maybe it is 
helpful:

http://en.wikipedia.org/wiki/Map_%28mathematics%29

Ah, no!  It only says that it is a function.  But there is a mention about 
partial maps!  And you said something about that:

> (In cg-admin-rewritehist some of them are (partial) filters,
> but the ones I have are not filters).

Alas, Wikipedia says about a partial function that it is a function which 
maps from one space into another space (not necessarily different spaces, 
though), but does not necessarily have a result for every input.

But in that sense, everything is a map.  Even rewrite-commits.  And 
commits. (Maybe you should rename that to map-maps?)

So maybe more luck with "filter"?  Since we still do not write in Haskell 
or something similar obscure, let's see if Wikipedia has something which 
is more likely applying to us.  Yes.  Software.  That surely 
must apply:

http://en.wikipedia.org/wiki/Filter_%28software%29

and even more to the point: UNIX filters:

http://en.wikipedia.org/wiki/Filter_%28Unix%29

You will see that both definitions apply _perfectly_ to what you do here.  
You pipe the commit into the "map" and write it out as a new commit!  
The _perfect_ example for a UNIX filter.

> I could extend the commit-map to remove the commit it the output is 
> empty, but it'd still be closer to a map than to a filter. (You can map 
> a commit to nothing, but a filter can't alter the elements of a list, it 
> only determines which elements are kept.)

This is not what the commit-filter of cg-admin-rewritehist does.  If at 
all, your commit-map is a better example for a filter than rewritehist's 
commit-filter is.

So, you cannot defend that bold statement.

Enough ranted about an unmerited rant, let's get back to the interesting 
stuff:

All you demonstrated here is that the way "commit-map"s work right now is 
not sufficient to make more complicated rewrite rules.  You cannot split 
one commit in two, and you cannot skip it.

> > 	- the name "map" itself is used in cg-admin-rewritehist, to map 
> > 	  commit names from old to new. By using that name differently, 
> > 	  again you contribute to confusion, for no good reason.
> 
> There is a good reason to call what I call maps maps.  They _are_ maps.
> Still, I'm open for suggestions.

Are you?  Well, my suggestion still stands...  I mean, the suggestion I 
already gave in the post you quoted.  You know, to keep the name as it was 
in the program you based your work on?  Okay, I'll be explicit for once: 
Filter.

> > - get_one_line() is a misnomer. It wants to be named get_linelen().
> 
> Hmmm... I guess you missed Linus' mistake when he introduced it
> in commit.c (e3bc7a3bc7b77f44d686003f5a9346a135529f73).

Yes.  But finger pointing will not help you here.  I reviewed _your_ code.

> Do you want me to rename that one as well?

Be my guest.  But let's do that independently, will ya?

> > - instead of spawning read-tree, you could use unpack_trees() to boost
> >   performance even more. But I guess it is probably left for later, to 
> >   make it easier to review the patch.
> 
> Yeah, it looked a bit tricky for an initial implementation, especially
> where I move the HEAD forward.

Later.

> > - The example you give with "git update-index --remove" can fail, right? 
> 
> Yes.  Spectacularly, even.

Does that mean you acknowledge that the man page should tell about this?  
And probably also reveal the "|| :" remedy?

> > - The commit filter again deviates from the usage in cg-admin-rewritehist. 
> >   I can see that you wanted to make it more versatile. However, it makes 
> >   the tool potentially also a bit more cumbersome to use. Besides, you use 
> >   a temporary file where there is no need to.
> 
> Are you saying I should use two pipes?

Umm.  Why not?

> What if the commit message is larger than the pipe buffer?

You start_command().  Then you write() until it is all written, or the 
pipe is broken.  Then you get the output via index_pipe().  Which is a 
single sha1.  I do not understand your question.

> > - the more fundamental problem with the missing "map", I do not see a 
> >   reasonable way to get the same functionality from any of the code 
> >   snippets passed to rewrite-commits. Indeed, even the workaround of 
> >   cg-admin-rewritehist, to read $TEMP/map/$sha1, does not work, since you 
> >   are keeping it all in memory. On IRC, gitster suggested to use a 
> >   bidirectional pipe (such as stdin/stdout) to get at the new commit 
> >   names, but because of buffering, I guess this is no joy.
> 
> I could add an option to write $TEMP/map/$sha1, but it's not clear
> to me when such a map would be useful.  Please enlighten me.

To be flexible, you have to have a way to ask "what was this commit 
rewritten to?" from within a filter.

Example: maybe you expect the rewritten commit to have a tree identical to 
commits in a certain branch, which was _also_ possibly rewritten.  Maybe 
you want to mark these pairs, for example by a tag.  Then you need to get 
the sha1 of the rewritten commits.

> > As commented on IRC, the env/tree/parent/msg filters of 
> > cg-admin-rewritehist could be all emulated by commit filters. However, 
> > that would be really inconvenient. So at a later stage, these would 
> > have to be integrated into rewrite-commits (even if it would be 
> > possible to drive rewrite-commits by a shell porcelain, but I guess 
> > you are opposed to that idea, since you want to do everything else in 
> > C, too).
> 
> I'm not opposed to running a few commands and connecting stuff in shell. 
> (See git-submodule add, although I admit that I would have preferred to 
> do all of it in C.)

Yes, but your original version was very intrusive and hard to review.  In 
contrast, the simple shell script we have now is hackable by many, and 
easily reviewed.  Indeed, it takes but a couple of minutes to verify that 
it does what it should do and has no side effects.

But I agree, it is easy enough to run the commands from rewrite-commits.c.  
The harder part is to provide an infrastructure which makes it useful.

To enhance on the above example: you're rewriting the commit messages so 
that commit names are rewritten to match the rewritten commits.  That is 
possible by a message filter in cg-admin-rewritehist.

But now somebody comes along, and says "I have a history I need to 
rewrite.  All bug fixes.  The commit names were all abbreviated in the 
commit messages, but they always had 'commit ' in front of them.  I want 
to rewrite them, too."

No chance without a "map" function.  Except to bug you to change the C 
code for this very special need.

By contrast, filter-branch does not need to be changed for that at all.

> > However, the biggest and very real problem is that your filters do not 
> > have a "map" function to get the rewritten sha1 for a given sha1. That 
> > is what makes the filters so versatile, though, since you can skip 
> > revisions by much more complex rules than just greps on the commit 
> > message or header.
> 
> I thought your were opposed to the idea of skipping commits, since
> you still carry along the changes in those commits.

Are you trying to misunderstand me?

You can tell the user how to skip single commits.  But you have to warn 
them that they might be wanting something different, because the word 
"commit" not only implies a "revision", which you would skip in this 
context, but also a "commit diff", which you would _not_.

The "much more complex" rules clearly are for other use cases, which 
brings me to your next question:

> Do you have a use case?

_You_ already mentioned it.  Subdirectory filters.

And if you think a little further, you can easily see that for this to 
become useful, you _have_ to have a "map" function, too.  Because you are 
likely wanting to use rewrite-commits, or filter-branch, to transform big 
repositories into super/subprojects.

> > But hey, maybe it _is_ time to rethink the whole filter business, and 
> > introduce some kind of regular expression based action language. 
> > Something like
> > 
> > 	git rewrite-commits -e '/^author Darl McBribe/skip-commit' \
> 
> What's wrong with --author='!Darl McBribe' ?

It is a very special use case.  Not always will you be able to get all the 
information from the commit object you need for conditional operations.  
My example only showed that you can do the same with that syntax.  But 
imagine what you could do if we just added a small syntactical sugar:

	-e '?has-path:README?substitute/v2/v3/'

> > 		-e 'substitute/^author Joahnnes/author Johannes/header' \
> > 		-e 'substitute/poreclain/porcelain/body' \
> > 		-e 'rewrite-commit-names'
> 
> Hmmm... some of these would basically need a builtin sed. I was thinking 
> about adding --remove and --rename, though.

IMHO you will not get happy by introducing ever more, but still restricted 
functionality.

At the moment, cg-admin-rewritehist can do much more than what 
rewrite-commits can do, because the eval'ed filters give you a much 
greater freedom in what you can do (since "map" is available, and you can 
even modify the commit name mapping by the filters on the go: they can 
edit .git-rewrite/!), and how you can do it (no need to work around 
certain limitations by using temporary files).

A lot of the speed of rewrite-commits stems from the fact that it is 
limited in that way.

Don't get me wrong.  I _want_ rewrite-commits to replace filter-branch.  
But there are fundamental problems with filter-branch, and there are 
fundamental problems with having the same functionality from within C.  
And it appears to me that you do not even try to address the latter 
fundamental problems.

Ciao,
Dscho

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux