On 01/20/2016 05:05 AM, David A. Greene wrote: > Marcus Brinkmann <m.brinkmann@xxxxxxxxxxxx> writes: > >> 'git subtree split' will fail if the history of the subtree has empty >> tree commits (or trees that are considered empty, such as submodules). >> This fix keeps track of this condition and correctly follows the history >> over such commits. > > Thanks for working on this! Please add a test to t7900-subtree.sh. I couldn't get the tests to run and I couldn't find documentation on how to run it. If you enlighten me I can add a test :) >> @@ -625,12 +629,16 @@ cmd_split() >> >> # ugly. is there no better way to tell if this is a subtree >> # vs. a mainline commit? Does it matter? >> - if [ -z $tree ]; then >> - set_notree $rev >> - if [ -n "$newparents" ]; then >> - cache_set $rev $rev >> + if [ -z $found_first_commit ]; then >> + if [ -z $tree ]; then >> + set_notree $rev >> + if [ -n "$newparents" ]; then >> + cache_set $rev $rev >> + fi >> + continue >> + else >> + found_first_commit=yes >> fi >> - continue >> fi >> >> newrev=$(copy_or_skip "$rev" "$tree" "$newparents") || exit $? > > Can you explain the logic here? The old code appears to be using the > lack of a tree to filter out "mainline" commits from the subtree history > when splitting. If that test is only done before seeing a proper > subtree commit and never after, then any commit mainline commit > following the first subtree commit in the rev list will miss being > marked with set_notree and the cache will not have the identity entry > added. > > Test #36 in t7900-subtree.sh has a mainline commit listed after the > first subtree commit in the rev list, I believe. > > I'm not positive your change is wrong, I'd just like to understand it > better. I'd also like a comment explaining why it works so future > developers don't get confused. Overall, I am trying to better comment > the code as I make my own changes. It's possible the patch does not work for some cases. For example, I don't know how the rejoin variant of splits work. Some observations: 1) The notree list is never actually used except to identify which commits have been visited in check_parents. 2) I have no idea what use case is covered by the "if [ -n "$newparents" ]; then cache_set $rev $rev; fi". I left it in purely for traditional reasons. So, clarifying that would go a long way in understanding the code, and if there is a test for that, I will figure it out. 3a) The bug happens because on the first commit that deletes the subdir, newparents will not be empty, and the "cache_set $rev $rev" will kick in and subsequently (when the subdir is added again) the history will divert into the $rev commit which is not rewritten, but part of the unsplit tree. This seems very wrong to me! See 2). 3b) To be very clear: It seems logically inconsistent to me to ever call set_notree and cache_set on the same rev. It also seems logically inconsistent to me to call cache_set rev1 rev2 where rev2 is not rewritten. Both seem to be invariant errors that could be caught by assertions. They probably should. In fact, I think my patch makes the questionable if-case to be dead code, because newparents is never non-empty before found_first_commit is true. As such, I think it could be eliminated. But I am not 100% sure, as I don't know the intention of the original code. 4) My patch only preserves the special handling of empty trees up to the first commit that introduces subdir, because we don't want an empty commit at the beginning. After that, empty subdirs are not special at all - the empty tree is replaced by EMPTY_TREE and handled as if it's a normal subdir commit. copy_and_skip will do the right thing. 5) I didn't test any case with multiple parents (merge commits). There are several of those cases (merge commits into empty subdirs, branches with different non-empty subdirs from empty ones), and they don't apply to my use case (git-svn conversion). I read the copy_and_skip code and see that it optimizes some of those cases, and although I didn't see an obvious problem, I didn't think too deeply about it. Thanks, Marcus -- s<e>mantics GmbH Viktoriaallee 45 52066 Aachen Web: www.semantics.de Registergericht : Amtsgericht Aachen, HRB 8189 Geschäftsführer : Kay Heiligenhaus M.A. Dipl. Ing. José de la Rosa -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html