> Please do not violate Documentation/CodingGuidelines for our shell > scripted Porcelain, even if it is a script in contrib/ and also >please avoid bash-isms. I believe I have resolved the CodingGuidelines issues. > Also doesn't "subtree" have its own test? If this change is a fix > for some problem(s), can we have a test or two that demonstrate how > the current code without the patch is broken? I was able to add a test that validates against some of the metrics that are tracked when running a split for processing commits. Validated that before my fix the test fails, and after my fix the test passes. >> In the diagram below, 'M' represents the mainline repo branch, 'A' >> represents one subtree, and 'B' represents another. M3 and B1 represent >> a split commit for subtree B that was created from commit M4. M2 and A1 >> represent a split commit made from subtree A that was also created >> based on changes back to and including M4. M1 represents new changes to >> the repo, in this scenario if you try to run a 'git subtree split >> --rejoin' for subtree B, commits M1, M2, and A1, will be included in >> the processing of changes for the new split commit since the last >> split/rejoin for subtree B was at M3. The issue is that by having A1 >> included in this processing the command ends up needing to processing >> every commit down tree A even though none of that is needed or relevant >> to the current command and result. >> >> M1 >> | \ \ >> M2 | | >> | A1 | >> M3 | | >> | | B1 >> M4 | | > The above paragraph explains which different things you drew in the > diagram are representing, but it is not clear how they relate to > each other. Do they for example depict parent-child commit > relationship? What are the wide gaps between these three tracks and > what are the short angled lines leaning to the left near the tip? > Is the time/topology flowing from bottom to top? I am realizing I made a few mistakes with trying to illustrate the diagram which I will attempt to make more clear below. As for the 3 columns in the diagram, 'M' represents the mainline branch of the repo being developed in, while column 'A' represents the history of a subtree 'A' included in the repo, and column 'B' also represents the history of a subtree 'B' in the repo. The diagram attempts to illustrate when a 'git subtree split --rejoin' is used, that there is a commit made in the subtrees history, and that is then merged into the mainline repo branch. M1 | | M2 --- | | A1 | | M3 ---------- | | | B1 M4 | | Hopefully that helps better illustrate the state of the repo before the new 'git subtree split --rejoin' attempt and why it results in the described issue. >> +should_ignore_subtree_commit () { >> + if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ] >> + then >> + if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]] > > Here $dir is a free variable that comes from outside. The caller > does not supply it as a parameter to this function (and the caller > does not receive it as its parameter from its caller). Yet the file > as a whole seems to liberally make assignments to it ("git grep dir=" > on the file counts 7 assignments). Are we sure we are looking for > the right $dir in this particular grep? > > Side note: I am not familiar with this part of the code at > all, so do not take it as "here is a bug", but more as "this > smells error prone." >From my testing and what I see for '$dir' usage in the 'cmd_split' function which leads to this code it is the correct '$dir', although I see your point about it being reassigned in different places which makes it error prone. I switched this to use the command line argument '$arg_prefix' since the subtree prefix passed into the command is what we actually want in this case so we can filter out commits from other subtrees. > Also can $dir have regular expressions special characters? "The > existing code and new code alike, git-subtree is not prepared to > handle directory names with RE special characters well at all, so > do not use them if you do not want your history broken" is an > acceptable answer. As far as I can tell from looking at the code (which I only recently started using) the '$dir' which is based on the subtree prefix is not setup to handle this. > The caller of this function process_split_commit is cmd_split and > process_split_commit (hence this function) is called repeatedly > inside a loop. This function makes a traversal over the entire > history for each and every iteration in "good" cases where there is > no 'mainline' or 'subtree-dir' commits for the given $dir. > > I wonder if it is more efficient to enumerate all commits that hits > these grep criteria in the cmd_split before it starts to call > process_split_commit repeatedly. If it knows which commit can be > ignored beforehand, it can skip and not call process_split_commit, > no? Moved this functionality into the 'cmd_split' function as suggested. >> + then >> + return 0 >> + fi >> + fi >> + return 1 >> +} >> + >> # Usage: process_split_commit REV PARENTS >> process_split_commit () { >> assert test $# = 2 >> local rev="$1" >> local parents="$2" > > These seem to assume that $1 and $2 can have $IFS in them, so > shouldn't ... > >> + if should_ignore_subtree_commit $rev > > ... this call too enclose $rev inside a pair of double-quotes for > consistency? We know the loop in the cmd_split that calls this > function is reading from "rev-list --parents" and $rev is a 40-hex > commit object name (and $parents can have more than one 40-hex > commit object names separated with SP), so it is safe to leave $rev > unquoted, but it pays to be consistent to help make the code more > readable. Updated this for consistency On Mon, Sep 18, 2023 at 9:04 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > "Zach FettersMoore via GitGitGadget" <gitgitgadget@xxxxxxxxx> > writes: > > > In the diagram below, 'M' represents the mainline repo branch, 'A' > > represents one subtree, and 'B' represents another. M3 and B1 represent > > a split commit for subtree B that was created from commit M4. M2 and A1 > > represent a split commit made from subtree A that was also created > > based on changes back to and including M4. M1 represents new changes to > > the repo, in this scenario if you try to run a 'git subtree split > > --rejoin' for subtree B, commits M1, M2, and A1, will be included in > > the processing of changes for the new split commit since the last > > split/rejoin for subtree B was at M3. The issue is that by having A1 > > included in this processing the command ends up needing to processing > > every commit down tree A even though none of that is needed or relevant > > to the current command and result. > > > > M1 > > | \ \ > > M2 | | > > | A1 | > > M3 | | > > | | B1 > > M4 | | > > The above paragraph explains which different things you drew in the > diagram are representing, but it is not clear how they relate to > each other. Do they for example depict parent-child commit > relationship? What are the wide gaps between these three tracks and > what are the short angled lines leaning to the left near the tip? > Is the time/topology flowing from bottom to top? > > > diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh > > index e0c5d3b0de6..e9250dfb019 100755 > > --- a/contrib/subtree/git-subtree.sh > > +++ b/contrib/subtree/git-subtree.sh > > @@ -778,12 +778,29 @@ ensure_valid_ref_format () { > > die "fatal: '$1' does not look like a ref" > > } > > > > +# Usage: check if a commit from another subtree should be ignored from processing for splits > > Way overlong line. Please split them accordingly. I won't comment > on what CodingGuidelines tells us already, in this review, but have > a few comments here: > > > +should_ignore_subtree_commit () { > > + if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ] > > + then > > + if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]] > > Here $dir is a free variable that comes from outside. The caller > does not supply it as a parameter to this function (and the caller > does not receive it as its parameter from its caller). Yet the file > as a whole seems to liberally make assignments to it ("git grep dir=" > on the file counts 7 assignments). Are we sure we are looking for > the right $dir in this particular grep? > > Side note: I am not familiar with this part of the code at > all, so do not take it as "here is a bug", but more as "this > smells error prone." > > Also can $dir have regular expressions special characters? "The > existing code and new code alike, git-subtree is not prepared to > handle directory names with RE special characters well at all, so > do not use them if you do not want your history broken" is an > acceptable answer. > > The caller of this function process_split_commit is cmd_split and > process_split_commit (hence this function) is called repeatedly > inside a loop. This function makes a traversal over the entire > history for each and every iteration in "good" cases where there is > no 'mainline' or 'subtree-dir' commits for the given $dir. > > I wonder if it is more efficient to enumerate all commits that hits > these grep criteria in the cmd_split before it starts to call > process_split_commit repeatedly. If it knows which commit can be > ignored beforehand, it can skip and not call process_split_commit, > no? > > > + then > > + return 0 > > + fi > > + fi > > + return 1 > > +} > > + > > # Usage: process_split_commit REV PARENTS > > process_split_commit () { > > assert test $# = 2 > > local rev="$1" > > local parents="$2" > > These seem to assume that $1 and $2 can have $IFS in them, so > shouldn't ... > > > + if should_ignore_subtree_commit $rev > > ... this call too enclose $rev inside a pair of double-quotes for > consistency? We know the loop in the cmd_split that calls this > function is reading from "rev-list --parents" and $rev is a 40-hex > commit object name (and $parents can have more than one 40-hex > commit object names separated with SP), so it is safe to leave $rev > unquoted, but it pays to be consistent to help make the code more > readable. > > > + then > > + return > > + fi > > + > > if test $indent -eq 0 > > then > > revcount=$(($revcount + 1)) > > > > base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518