Re: [PATCH] subtree: fix split processing with multiple subtrees present

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Please do not violate Documentation/CodingGuidelines for our shell
> scripted Porcelain, even if it is a script in contrib/ and also
>please avoid bash-isms.

I believe I have resolved the CodingGuidelines issues.

> Also doesn't "subtree" have its own test?  If this change is a fix
> for some problem(s), can we have a test or two that demonstrate how
> the current code without the patch is broken?

I was able to add a test that validates against some of the metrics that
are tracked when running a split  for processing commits. Validated that
before my fix the test fails, and after my fix the test passes.

>> In the diagram below, 'M' represents the mainline repo branch, 'A'
>> represents one subtree, and 'B' represents another. M3 and B1 represent
>> a split commit for subtree B that was created from commit M4. M2 and A1
>> represent a split commit made from subtree A that was also created
>> based on changes back to and including M4. M1 represents new changes to
>> the repo, in this scenario if you try to run a 'git subtree split
>> --rejoin' for subtree B, commits M1, M2, and A1, will be included in
>> the processing of changes for the new split commit since the last
>> split/rejoin for subtree B was at M3. The issue is that by having A1
>> included in this processing the command ends up needing to processing
>> every commit down tree A even though none of that is needed or relevant
>> to the current command and result.
>>
>> M1
>>  |      \       \
>> M2       |       |
>>  |      A1       |
>> M3       |       |
>>  |       |      B1
>> M4       |       |

> The above paragraph explains which different things you drew in the
> diagram are representing, but it is not clear how they relate to
> each other.  Do they for example depict parent-child commit
> relationship?  What are the wide gaps between these three tracks and
> what are the short angled lines leaning to the left near the tip?
> Is the time/topology flowing from bottom to top?

I am realizing I made a few mistakes with trying to illustrate the diagram
which I will attempt to make more clear below. As for the 3 columns in the
diagram, 'M' represents the mainline branch of the repo being developed in,
while column 'A' represents the history of a subtree 'A' included in the
repo, and column 'B' also represents the history of a subtree 'B' in the
repo. The diagram attempts to illustrate when a 'git subtree split --rejoin'
is used, that there is a commit made in the subtrees history, and that is
then merged into the mainline repo branch.

M1
 |
 |
M2 --- |
 |     A1
 |     |
M3 ---------- |
 |     |      B1
M4     |      |

Hopefully that helps better illustrate the state of the repo before the new
'git subtree split --rejoin' attempt and why it results in the described issue.

>> +should_ignore_subtree_commit () {
>> +  if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
>> +  then
>> +    if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
>
> Here $dir is a free variable that comes from outside.  The caller
> does not supply it as a parameter to this function (and the caller
> does not receive it as its parameter from its caller).  Yet the file
> as a whole seems to liberally make assignments to it ("git grep dir="
> on the file counts 7 assignments).  Are we sure we are looking for
> the right $dir in this particular grep?
>
>  Side note: I am not familiar with this part of the code at
>  all, so do not take it as "here is a bug", but more as "this
>  smells error prone."

>From my testing and what I see for '$dir' usage in the 'cmd_split'
function which leads to this code it is the correct '$dir', although
I see your point about it being reassigned in different places which
makes it error prone. I switched this to use the command
line argument '$arg_prefix' since the subtree prefix passed into
the command is what we actually want in this case so we can filter
out commits from other subtrees.

> Also can $dir have regular expressions special characters?  "The
> existing code and new code alike, git-subtree is not prepared to
> handle directory names with RE special characters well at all, so
> do not use them if you do not want your history broken" is an
> acceptable answer.

As far as I can tell from looking at the code (which I only recently
started using) the '$dir' which is based on the subtree prefix is
not setup to handle this.

> The caller of this function process_split_commit is cmd_split and
> process_split_commit (hence this function) is called repeatedly
> inside a loop.  This function makes a traversal over the entire
> history for each and every iteration in "good" cases where there is
> no 'mainline' or 'subtree-dir' commits for the given $dir.
>
> I wonder if it is more efficient to enumerate all commits that hits
> these grep criteria in the cmd_split before it starts to call
> process_split_commit repeatedly.  If it knows which commit can be
> ignored beforehand, it can skip and not call process_split_commit,
> no?

Moved this functionality into the 'cmd_split' function as suggested.

>> +    then
>> +      return 0
>> +    fi
>> +  fi
>> +  return 1
>> +}
>> +
>>  # Usage: process_split_commit REV PARENTS
>>  process_split_commit () {
>>   assert test $# = 2
>>   local rev="$1"
>>   local parents="$2"
>
> These seem to assume that $1 and $2 can have $IFS in them, so
> shouldn't ...
>
>> +    if should_ignore_subtree_commit $rev
>
> ... this call too enclose $rev inside a pair of double-quotes for
> consistency?  We know the loop in the cmd_split that calls this
> function is reading from "rev-list --parents" and $rev is a 40-hex
> commit object name (and $parents can have more than one 40-hex
> commit object names separated with SP), so it is safe to leave $rev
> unquoted, but it pays to be consistent to help make the code more
> readable.

Updated this for consistency


On Mon, Sep 18, 2023 at 9:04 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Zach FettersMoore via GitGitGadget" <gitgitgadget@xxxxxxxxx>
> writes:
>
> > In the diagram below, 'M' represents the mainline repo branch, 'A'
> > represents one subtree, and 'B' represents another. M3 and B1 represent
> > a split commit for subtree B that was created from commit M4. M2 and A1
> > represent a split commit made from subtree A that was also created
> > based on changes back to and including M4. M1 represents new changes to
> > the repo, in this scenario if you try to run a 'git subtree split
> > --rejoin' for subtree B, commits M1, M2, and A1, will be included in
> > the processing of changes for the new split commit since the last
> > split/rejoin for subtree B was at M3. The issue is that by having A1
> > included in this processing the command ends up needing to processing
> > every commit down tree A even though none of that is needed or relevant
> > to the current command and result.
> >
> > M1
> >  |      \       \
> > M2       |       |
> >  |      A1       |
> > M3       |       |
> >  |       |      B1
> > M4       |       |
>
> The above paragraph explains which different things you drew in the
> diagram are representing, but it is not clear how they relate to
> each other.  Do they for example depict parent-child commit
> relationship?  What are the wide gaps between these three tracks and
> what are the short angled lines leaning to the left near the tip?
> Is the time/topology flowing from bottom to top?
>
> > diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> > index e0c5d3b0de6..e9250dfb019 100755
> > --- a/contrib/subtree/git-subtree.sh
> > +++ b/contrib/subtree/git-subtree.sh
> > @@ -778,12 +778,29 @@ ensure_valid_ref_format () {
> >               die "fatal: '$1' does not look like a ref"
> >  }
> >
> > +# Usage: check if a commit from another subtree should be ignored from processing for splits
>
> Way overlong line.  Please split them accordingly.  I won't comment
> on what CodingGuidelines tells us already, in this review, but have
> a few comments here:
>
> > +should_ignore_subtree_commit () {
> > +  if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ]
> > +  then
> > +    if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]]
>
> Here $dir is a free variable that comes from outside.  The caller
> does not supply it as a parameter to this function (and the caller
> does not receive it as its parameter from its caller).  Yet the file
> as a whole seems to liberally make assignments to it ("git grep dir="
> on the file counts 7 assignments).  Are we sure we are looking for
> the right $dir in this particular grep?
>
>         Side note: I am not familiar with this part of the code at
>         all, so do not take it as "here is a bug", but more as "this
>         smells error prone."
>
> Also can $dir have regular expressions special characters?  "The
> existing code and new code alike, git-subtree is not prepared to
> handle directory names with RE special characters well at all, so
> do not use them if you do not want your history broken" is an
> acceptable answer.
>
> The caller of this function process_split_commit is cmd_split and
> process_split_commit (hence this function) is called repeatedly
> inside a loop.  This function makes a traversal over the entire
> history for each and every iteration in "good" cases where there is
> no 'mainline' or 'subtree-dir' commits for the given $dir.
>
> I wonder if it is more efficient to enumerate all commits that hits
> these grep criteria in the cmd_split before it starts to call
> process_split_commit repeatedly.  If it knows which commit can be
> ignored beforehand, it can skip and not call process_split_commit,
> no?
>
> > +    then
> > +      return 0
> > +    fi
> > +  fi
> > +  return 1
> > +}
> > +
> >  # Usage: process_split_commit REV PARENTS
> >  process_split_commit () {
> >       assert test $# = 2
> >       local rev="$1"
> >       local parents="$2"
>
> These seem to assume that $1 and $2 can have $IFS in them, so
> shouldn't ...
>
> > +    if should_ignore_subtree_commit $rev
>
> ... this call too enclose $rev inside a pair of double-quotes for
> consistency?  We know the loop in the cmd_split that calls this
> function is reading from "rev-list --parents" and $rev is a 40-hex
> commit object name (and $parents can have more than one 40-hex
> commit object names separated with SP), so it is safe to leave $rev
> unquoted, but it pays to be consistent to help make the code more
> readable.
>
> > +    then
> > +         return
> > +    fi
> > +
> >       if test $indent -eq 0
> >       then
> >               revcount=$(($revcount + 1))
> >
> > base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux