(Apologies for message formatting, fighting Outlook) > I've been trying to understand how the subtree cache (mis)behaves in > this case. The cache is initially seeded from find_existing_splits(), > which finds these lines the adc8ecf commit message: > > git-subtree-mainline: 9b6e8f677b700a00e9f1715e2624bf5ed756dc85 > git-subtree-split: 5280958b2f997c3ce7bff7192cceb19f55b45cd9 > > and adds these corresponding entries to the cache: > > 9b6e8f6 -> 5280958 > 5280958 -> 5280958 > > In other words, the cache starts out claiming that 5280958 is the > equivalent subtree commit for the 9b6e8f6 mainline commit. > However, in my naive understanding this does not make sense, as > 9b6e8f6 _precedes_ the subtree addition, and has no content in > the relevant subdir. I think you've identified the exact problem right here. In the normal split/rejoin commits, the mainline commit *prior to* the merge commit does, in fact, represent the same subtree state as the subtree commit which is also merged at that point. But in the case of an add, that's not true, and I'm actually a little surprised the same commit message markers are generated. What really should be captured by the initial cache seeding is that the add merge commit *itself* has the same subtree content. However, that can't be determined at the time the merge commit is created, as the hash of that merge commit is determined by the commit message itself. I think a possible solution to this would be modifying the initial cache process to isolate the Add commits and handle them differently. Rather than using the hashes in the commit message, it should map the merge commit itself to the subtree-split commit, and either do nothing with the subtree-mainline commit hash, or explicitly set it to notree. However, this will complicate the logic of building the initial cache, as it currently only cares about the existence of those simple lines, and adds the mappings, erroneously as you have noted in this case. It might also be worth changing the commit message provided for adds so it no longer generates the incorrect assertion that the mainline commit is identical subtree-wise, but even with that change, support for correctly handling existing commits would still be ideal. Currently, we're using a local version of the subtree script based on some changes laid out here: https://github.com/gitgitgadget/git/pull/493 I'm hopeful that changeset will eventually land here, as it helps with several complex issues in our repositories. I bring that up also because it introduces some additional tools for managin the initial cache, allowing manual mapping of one commit to another. That version might allow some level of testing on whether this idea would correct the problem described. -- Roger Strain