Re: [PATCH] builtin-apply: keep information about files to be deleted

Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> · Sat, 18 Apr 2009 22:58:47 +0200

Junio C Hamano <gitster@xxxxxxxxx> wrote:

> Michał Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> writes:
>   
>> ... However, there are some cases when these two
>> rules may cause problems:
>>
>> patch #1: rename A to B
>> patch #2: rename C to A
>> patch #3: modify A
>>
>> Should patch #3 modify B (which was A) or A (which was C)?
>>
>> patch #1: rename A to B
>> patch #2: rename B to A
>> patch #3: modify A
>> patch #4: modify B
>>
>> Which files should be patched by #3 and #4?
>>
>> In my opinion both #3 and #4 should fail (or both should succeed) --
>> with my patch only #3 will work and #4 will be rejected, because in
>> #2 B was marked as deleted.  
> 
> Both of the examples above cannot be emitted as a single commit by
> format-patch; the user is feeding a combined patch.  Perhaps renames
> in each example sequence were came from one git commit but
> modifications are from separate commit or handcrafted "follow-up"
> patch.

Yes, that's true. In "normal" case, renames and modifications should be
handled properly and (generally) aren't subject of this discussion.

>
> There are two stances we can take:
> 
>  (1) The user knows what he is doing.
> 
>      In the first example, if he wanted the change in #3 to end up in
> B, he would have arranged the patches in a different order, namely, 3
> 1 2, but he didn't.  We should modify A (that came from C).
> 
>  (2) In situations like these when it is unusual and there is no
> clear and unambiguous answer, the rule has always been "fail and ask
> the user to clean up", because silently doing a wrong thing in an
> unusual situation that happens only once in a while is far worse than
>      interrupting the user and forcing a manual intervention.
> 
>      In the first example, there is no clear answer.  Perhaps all
> three patches were independent patches (the first two obviously came
> from git because only we can do renames, but they may have been
> separate commits), and the user may have reordered them (or just
> picked a random order because he was linearizing a history with a
> merge).
> 
> The second one is even iffier.  If we _know_ that originally patch #1
> and #2 came from the same commit, then they represent swapping
> between A and B, but if they came from different git commits, and if
> the user picked patches in a random order, it may mean something
> completely different.

The problem here is that there are at least two patches which touch the
same file(s) and it is impossible to say which patches should be handled
atomically. However, there is no easy way to specify renames as a
single patch. A diff containing swapping of three files looks like this:

	diff --git a/file2 b/file1
	similarity index 100%
	rename from file2
	rename to file1
	diff --git a/file3 b/file2
	similarity index 100%
	rename from file3
	rename to file2
	diff --git a/file1 b/file3
	similarity index 100%
	rename from file1
	rename to file3

BTW: it applies correctly :).

> 
> I am somewhat tempted to say that we should fail all of them,
> including the original "single patch swapping files" brought up by
> Linus.

I may agree that difficult scenarios should be rejected, but I will
also say that git-apply should always accept git-diff output.

> 
> BUT
> 
> Can we make use simple rule to detect problematic cases?
> 
>  - An input to git-apply can contain more than one patch that affects
> a path; however
> 
>    - you cannot create a path that still exists, except for a path
> that _will_ be renamed away or removed (your patch fixes this by
> adding this "except for..." part to loosen the existing rule);
> 
>    - you cannot modify a path in a separate patch if it is involved
> in an either side of a rename (this will catch the ambiguity of patch
> #3 in your first example and #3 and #4 in your second example);

What should happen in following situation:

patch #1: modify A
patch #2: rename A to B

#2 should fail? Now it creates new B which is a copy of A before
applying any patches and modifies A according to #1.

AFAIC, copies and renames are handled differently from normal
modifications (in_fn_table() is not used for them, but
add_to_fn_table() is, so "rename patches don't look in the past, but
have influence upon the future").

> 
>  - In addition:
> 
>    - the same path cannot be renamed from more than once (this will
> catch concatenation of two git generated patches);
> 
> With such a change, I think we can keep the safety of "when there are
> more than one plausible outcomes, the tool shouldn't silently decide,
> nor make progress that the user later needs to undo and redo", while
> allowing a sane use of rename patches generated out of a git commit.
> 
> 

Do you mean that patches which break above rules should be
skipped when "--reject" is set, as other failures? Or that
whole git-apply should fail regardless of "--reject"?

Michal Kiedrowicz
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html