Re: [PATCH] contrib/subtree: Split history with empty trees correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/20/2016 05:05 AM, David A. Greene wrote:
> Marcus Brinkmann <m.brinkmann@xxxxxxxxxxxx> writes:
> 
>> 'git subtree split' will fail if the history of the subtree has empty
>> tree commits (or trees that are considered empty, such as submodules).
>> This fix keeps track of this condition and correctly follows the history
>> over such commits.
> 
> Thanks for working on this!  Please add a test to t7900-subtree.sh.

I couldn't get the tests to run and I couldn't find documentation on how
to run it.  If you enlighten me I can add a test :)

>> @@ -625,12 +629,16 @@ cmd_split()
>>  		
>>  		# ugly.  is there no better way to tell if this is a subtree
>>  		# vs. a mainline commit?  Does it matter?
>> -		if [ -z $tree ]; then
>> -			set_notree $rev
>> -			if [ -n "$newparents" ]; then
>> -				cache_set $rev $rev
>> +		if [ -z $found_first_commit ]; then
>> +			if [ -z $tree ]; then
>> +				set_notree $rev
>> +				if [ -n "$newparents" ]; then
>> +					cache_set $rev $rev
>> +				fi
>> +				continue
>> +			else
>> +				found_first_commit=yes
>>  			fi
>> -			continue
>>  		fi
>>
>>  		newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $?
> 
> Can you explain the logic here?  The old code appears to be using the
> lack of a tree to filter out "mainline" commits from the subtree history
> when splitting.  If that test is only done before seeing a proper
> subtree commit and never after, then any commit mainline commit
> following the first subtree commit in the rev list will miss being
> marked with set_notree and the cache will not have the identity entry
> added.
> 
> Test #36 in t7900-subtree.sh has a mainline commit listed after the
> first subtree commit in the rev list, I believe.
> 
> I'm not positive your change is wrong, I'd just like to understand it
> better.  I'd also like a comment explaining why it works so future
> developers don't get confused.  Overall, I am trying to better comment
> the code as I make my own changes.

It's possible the patch does not work for some cases.  For example, I
don't know how the rejoin variant of splits work.

Some observations:

1) The notree list is never actually used except to identify which
commits have been visited in check_parents.

2) I have no idea what use case is covered by the "if [ -n "$newparents"
]; then cache_set $rev $rev; fi".  I left it in purely for traditional
reasons.  So, clarifying that would go a long way in understanding the
code, and if there is a test for that, I will figure it out.

3a) The bug happens because on the first commit that deletes the subdir,
newparents will not be empty, and the "cache_set $rev $rev" will kick in
and subsequently (when the subdir is added again) the history will
divert into the $rev commit which is not rewritten, but part of the
unsplit tree.  This seems very wrong to me!  See 2).

3b) To be very clear: It seems logically inconsistent to me to ever call
set_notree and cache_set on the same rev.  It also seems logically
inconsistent to me to call cache_set rev1 rev2 where rev2 is not
rewritten.  Both seem to be invariant errors that could be caught by
assertions.  They probably should.  In fact, I think my patch makes the
questionable if-case to be dead code, because newparents is never
non-empty before found_first_commit is true.  As such, I think it could
be eliminated.  But I am not 100% sure, as I don't know the intention of
the original code.

4) My patch only preserves the special handling of empty trees up to the
first commit that introduces subdir, because we don't want an empty
commit at the beginning.  After that, empty subdirs are not special at
all - the empty tree is replaced by EMPTY_TREE and handled as if it's a
normal subdir commit.  copy_and_skip will do the right thing.

5) I didn't test any case with multiple parents (merge commits).  There
are several of those cases (merge commits into empty subdirs, branches
with different non-empty subdirs from empty ones), and they don't apply
to my use case (git-svn conversion).  I read the copy_and_skip code and
see that it optimizes some of those cases, and although I didn't see an
obvious problem, I didn't think too deeply about it.

Thanks,
Marcus



-- 
s<e>mantics GmbH
Viktoriaallee 45
52066 Aachen
Web: www.semantics.de
Registergericht  : Amtsgericht Aachen, HRB 8189
Geschäftsführer  : Kay Heiligenhaus M.A.
                   Dipl. Ing. José de la Rosa
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]