So it's been quite a while since I made this specific change, but I'll attach the relevant portion of the diff below. I may be completely misremembering portions, and apologize in advance. This was based on an earlier version of the script, and I can see some other changes have been made since I forked, but perhaps this will still explain what I tried to do to work around our problem. Within process_split_commit, there's logic that tries to distinguish between commits which are mainline and commits which are subtree. There's even a comment in the relevant section asking "Is there no better way? Does it matter?" Well, the answer was yes, it mattered, because we were picking up mainline commits that there before the initial add of a subtree, and those were getting sucked in as if they were subtree commits, and then all the remaining hashes were off. What this change was meant to do was to check for the existence of a single, known file. We keep a file called "subtrees.csv" in the root of our mainline repo, and it defines the various subtrees that comprise the mainline. Therefore, if that file exists, I can say with certainty that it is a mainline commit. So when that dodgy check comes up, it checks for the file first, then falls back to the old behavior. Partial diff follows, feel free to try it out if it sounds like a similar problem that you're facing. Change the specific filename for your needs, obviously. To be clear, this is NOT something I'm submitting for inclusion in the general release; it's very repo-specific, and I just hope it might help a fellow soul. @@ -506,6 +499,20 @@ subtree_for_commit () { done } +subtree_for_csv () { + commit="$1" + dir="$2" + git ls-tree "$commit" -- "$dir" | + while read mode type tree name + do + assert test "$name" = "$dir" + assert test "$type" = "blob" -o "$type" = "commit" + test "$type" = "commit" && continue # ignore submodules + echo $tree + break + done +} + tree_changed () { tree=$1 shift @@ -667,9 +674,17 @@ process_split_commit () { if test -z "$tree" then set_notree "$rev" - if test -n "$newparents" + subtreescsv=$(subtree_for_csv "$rev" "subtrees.csv") + debug "${indentprefix} subtrees.csv tree is: $subtreescsv" + + # ugly. is there no better way to tell if this is a subtree + # vs. a mainline commit? Does it matter? + if test -z "$subtreescsv" then - cache_set "$rev" "$rev" + if test -n "$newparents" + then + cache_set "$rev" "$rev" + fi fi return fi -- Roger Strain -----Original Message----- From: Ed Maste <emaste@xxxxxxxxxxx> To: "Strain, Roger L." <roger.strain@xxxxxxxx> Cc: git@xxxxxxxxxxxxxxx <git@xxxxxxxxxxxxxxx>, marc@xxxxxxx < marc@xxxxxxx> Subject: Re: Regression in git-subtree.sh, introduced in 2.20.1, after 315a84f9aa0e2e629b0680068646b0032518ebed Date: Mon, 09 Dec 2019 06:45:57 -0500 [EXTERNAL EMAIL] On Mon, 9 Dec 2019 at 09:29, Strain, Roger L. <roger.strain@xxxxxxxx> wrote: I've had to further customize the script for our internal use, and those changes aren't something that would be useful for the public at large. Would you describe the sort of problem you have to work around with custom changes? I'm starting on a path of trying to fix git-subtree for failures[1] encountered in a prototype conversion of the FreeBSD repository from svn to git. The misbehaviour I encounter occurs when split encounters a commit for which the path being split is empty in 'git ls-tree', and the commit is actually not a subtree commit. I'm currently experimenting with hacks to skip specific hashes during the initial subtree split. On reading your mail I realize I could address my issue by testing for the existence of a specific file though, which makes me wonder if the issue you have is similar. [1] https://lore.kernel.org/git/CAPyFy2AsmaxU-BDf_teZJE5hiaVpTSZc8fftnuXPb_4-j7j5Fw@xxxxxxxxxxxxxx/