Re: [PATCH 2/5] add object-cache infrastructure

Illia Bobyr <Illia.Bobyr@xxxxxxxxxxxxxxxxx> · Tue, 12 Jul 2011 16:52:55 -0500

On 7/12/2011 12:35 AM, Jeff King wrote:
> On Mon, Jul 11, 2011 at 07:14:16PM -0500, Illia Bobyr wrote:
>
>> I am not 100% sure that my solution is exactly about this problem, but
>> it seems to be quite relevant.
>>
>> I think that if you rebase "step-by-step" by doing, for this particular
>> example, something like
>>
>> $ git rebase master^ topic
>> $ git rebase master topic
>>
>> You will first see the /modified one/one/ conflict that you will resolve
>> your "two" against and then your second rebase will apply with no conflicts.
>>
>> I have a set of scripts that help me do this kind of rebases by
>> essentially rebasing the topic branch against every single commit on the
>> upstream.
> That makes a lot of sense to me as a strategy. Of course, as you
> mention, it is horribly slow. And when you do have real conflicts, you
> would end up looking at the same conflicts again and again (and as you
> mention, rerere can be some help there, though not necessarily
> perfect).

Well, I would like to comment on "horribly slow" a little bit :)
It is slower than a normal rebase, but it is much faster than, at least 
in my case, me doing the conflict resolution on the final versions 
without intermediate steps.

Also, in my case, it is faster than compiling every single commit of the 
topic branch after the rebase.  Something that I just have to do, again, 
because if I do this only with the final version it gives me so much 
errors all over the code, that I can hardly do anything with it.  
Besides I would like the commits that were "incorrectly" merged and that 
introduced the compilation error to contain the fixes, not have all the 
fixes as a final "merge" commit.

In other words, while been slow, it is just one step in a general 
process that, in my case, have steps that are slower.

Also, my guess is that, as rebase is an sh script it may be made faster 
and a step-by-step rebase will became considerably faster as well, if it 
would be rewritten in C.  Though I have not looked a lot inside the 
script, so I might be wrong.

I would like to note that in case of hundreds of rebases with dozens of 
conflicts rerere's help is hard to overestimate.  I have rebased my 
topic branch through at least 500 commits for the last 6 month and as I 
actually do not copy any changes into the master branch, some conflicts 
stay we me for the whole time.

>> At the same time the less changes are in topic...master the faster it
>> would be and the more changes are there the more you benefit from a
>> gradual rebase.
> Yeah, this seems like the real problem to me. It's one thing to rebase
> on top of a single series that somebody has applied upstream. But if it
> has been 2 weeks, there may be hundreds of commits, and doing hundreds
> of rebases is awful. I wonder if you could do better by picking out some
> "key" commits in master to rebase on top of using one of:
>
>    1. Divide-and-conquer the commit space. Try the rebase, starting on
>       the HEAD. If it works, great. If the user says "this is too hard",
>       then find the midpoint between where we tried to rebase and the
>       merge base, and try rebasing there. Every time it's too hard, go
>       back halfway to the start. Every time it's easy, try the new result
>       on top of HEAD.
>
>       So it's basically doing a O(lg n) search backwards for an easy
>       place to rebase, and then repeatedly checking if that was a good
>       spot (and repeating the backwards search if not). The worst case
>       complexity is O(n lg n) rebases. But in practice, you can hopefully
>       find the problematic spot in O(lg n), and then everything will just
>       work out after 1 or 2 problematic spots.

I have problems all over the upstream history :)
And the issue is not in exactly in finding a problem spot.  It is about 
giving the user (that is me in this case) something that he can merge in 
a reasonable amount of time.

I have a topic branch with 174 commits.  It takes my machine about 7 
minutes to rebase it.
If I have a conflict that I do not understand, it may take me, on 
average, two hours to figure out what a conflict is about and fix it.  
Sometimes it may take 4 hours or even more if I have to involve other 
developers.
If my machine will have to do 20 rebases or even 40 and it will present 
me with simple conflicts that I can solve in seconds it would still be 
better than if I will spend hours trying to figure out something and 
give up by saying "this is too hard".  Note that while it rebases I am 
working on something else.

In my case sometimes even the most basic conflicts that arise because of 
a rebase against just one commit may be hard to merge.
If I can avoid even one of these I will be happy to let one of my 
machine cores run for hours :)

Obviously, my case is a wired case caused by not-the-best development 
practices.  But, I guess, one can still consider it as one of the points 
on the curve that approximate this kind of use cases.  A pretty extreme 
point.

>    2. Use heuristics (like commit message content) to find related
>       commits. So if I have a 5-patch series, I can perhaps find the
>       likely commits upstream that match my patches, and those are
>       good places to try individual rebases. And then I don't care how
>       many commits are in master. If I have a 5 patch series, I won't do
>       more than 5 rebases.
>
> But I've never tried this in practice. Maybe next time a rebase is ugly
> I'll manually work through one of the methods and see how it fares.

I guess that I view this problem from a little different angle, as in my 
case, it is not be exactly my own patches that are causing problems, but 
an upstream changes that have other changes base on them.

Now I am guessing, but here is another idea.

I think that one can check the modification history of the lines in the 
master commits we are rebasing against and in all the topic commits, 
similar to what blame does.
Essentially take a set of all lines that were modified by the topic 
branch (along with the context lines) and sets of lines modified by ever 
single commit (without the context lines).
If a commit does not touch lines from the topic branch set it will not 
cause conflicts and we can skip it.  It will just cause offsets in the 
line numbers when the patches will be applied.

If you are rebasing against a lot of changes that are unrelated to your 
topic branch and if they are split into commits correctly, this way, I 
guess, it would be possible to find those that may cause conflicts.
The actual rebase may still be able to go through some of them without 
conflicts but I see this as a first approximation that might save time.  
And it seems to give only false negatives.

Kind of a conflict prediction approximation.

Ilya��.n��������+%������w��{.n��������n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�