Hi Phillip, On Mon, 21 Nov 2022, Phillip Wood wrote: > [...] > > @@ range-diff.c: struct patch_util { > > + > > + line = contents.buf; > > + size = contents.len; > > -+ for (; size > 0; size -= len, line += len) { > > ++ for (; size; size -= len, line += len) { > > + const char *p; > > + > > -+ len = find_eol(line, size); > > ++ len = find_next_line(line, size); > > + > > -+ if (state == MBOX_BEFORE_HEADER) { > > ++ if (state == MBOX_BEFORE_HEADER || > > ++ (state == MBOX_IN_DIFF && line[0] == 'F')) { > > If we see an 'F' when parsing a diff then it is probably a From: line > indicating that we've reached the start of a new patch. I think we need to add > the diff we've just parsed to the list of commits before continuing and change > state to MBOX_BEFORE_HEADER otherwise we'll hit the continue just below this > comment with state unchanged. As always, thank you *so* much for modeling how to perform productive, actionable and *thorough* code reviews that focus on the relevant. You spotted something important. It is a bug, as you suspected, and there is even more: `util` leaks because it is allocated both after the patch was parsed and when we see a diff header, without releasing the first (but granted, this leak has been there before). You have no idea how much I appreciate your diligence; I understand how much of an effort it takes to study patches in depth until you understand them enough to spot complex issues like this one. (In Parkinson's parlance: I am grateful that you focus on the design of the nuclear power plant, see https://en.wikipedia.org/w/index.php?title=Atwood%27s_duck) Back to our regularly scheduled programming (pun intended): I indeed made a mistake here and fixed it, by removing that 'F' check again, and instead adding a `goto` label and jumping to that upon encountering the end of the diff. I also de-duplicated the `util = xcalloc(1, sizeof(*util))` calls so that we do that _only_ in the `MBOX_BEFORE_HEADER` arm, and no longer at the end of the `MBOX_IN_DIFF` arm. Since I already looked at a leak, I looked further and saw that also `copy` is leaked in `show_range_diff()` in case of an mbox. I've addressed that, too. > > > + if (!skip_prefix(line, "From ", &p)) > > + continue; > > + > > @@ range-diff.c: struct patch_util { > > + author = subject = NULL; > > + > > + state = MBOX_IN_HEADER; > > ++ continue; > > + } > > + > > + if (starts_with(line, "diff --git ")) { > > @@ range-diff.c: struct patch_util { > > + strbuf_addch(&buf, '\n'); > > + if (!util->diff_offset) > > + util->diff_offset = buf.len; > > -+ line[len - 1] = '\n'; > > ++ > > + orig_len = len; > > -+ len = parse_git_diff_header(&root, &linenr, 1, > > line, > > -+ len, size, > > &patch); > > ++ /* `find_next_line()`'s replaced the LF with a > > NUL */ > > ++ line[len - 1] = '\n'; > > ++ len = len > 1 && line[len - 2] == '\r' ? > > ++ error(_("cannot handle diff headers > > with " > > ++ "CR/LF line endings")) : > > ++ parse_git_diff_header(&root, &linenr, > > 1, line, > > ++ len, size, > > &patch); > > + if (len < 0) { > > + error(_("could not parse git header > > '%.*s'"), > > + orig_len, line); > > ++ release_patch(&patch); > > Thank you for using release_patch() rather than the "free random struct > members" approach that was suggested. Indeed, it sounded like a specific suggestion to release a specific `struct` member, but that was of course not the end of the story, and I completed the analysis myself that I had wished the reviewer would have completed _before_ sending off a review. > > @@ t/t3206-range-diff.sh: test_expect_success 'ranges with pathspecs' > > ' > > +test_expect_success 'compare range vs mbox' ' > > + git format-patch --stdout topic..mode-only-change >mbox && > > + git range-diff topic...mode-only-change >expect && > > ++ > > + git range-diff mode-only-change..topic mbox:./mbox >actual && > > + test_cmp expect actual && > > -+ git range-diff mode-only-change..topic mbox:- <mbox >actual && > > -+ test_cmp expect actual > > ++ > > I'm a bit confused by this sed command, I've annotated it with my probably > flawed understanding. > > > ++ sed -e "/^From: .*/{ > > ++ h > > This stores the From: header in the hold space 👍 > > ++ s/.*/From: Bugs Bunny <bugs@xxxxxx>/ > > This changes the From: header in the pattern space 👍 > > ++ :1 > > ++ N > > ++ /[ -z]$/b1 > > We loop until we find a line that does not end with a space, letter or number > adding the lines to the hold space I would have _loved_ to match on an empty line, i.e. `/[^\n]$/b1`. But that construct is not understood by the `sed` on macOS. I even went so far as to search for the source code of a BSD `sed` (and I found it, and modified the code so that it builds on Linux, see https://github.com/dscho/sed-bsd) to try a couple of things, but could not make it work with any variation of `\n`. Therefore, I settled on expecting all the lines in the commit header to end in printable ASCII characters between the space and the `z`. > > ++ G > > This appends the hold space to the pattern space, then the pattern space is > printed. 👍 > Doesn't this mean we end up with two From: headers? Is the in-body From: > line already present? There is no in-body `From:` header because the patch author matches the `GIT_AUTHOR_IDENT` that is in effect while running the `format-patch` command. Let me show you what this `sed` call deals with. In the local test run, it modified an `mbox` starting with this: From 4d39cb329d3ef4c8e69b43859c2e11adb83f8613 Mon Sep 17 00:00:00 2001 From: Thomas Rast <trast@xxxxxxxxxxx> Date: Mon, 22 Jul 2013 11:23:44 +0200 Subject: [PATCH 1/3] s/4/A/ + add other-file --- file | 2 +- other-file | 0 [...] to a modified `mbox` that starts with this: From 4d39cb329d3ef4c8e69b43859c2e11adb83f8613 Mon Sep 17 00:00:00 2001 From: Bugs Bunny <bugs@xxxxxx> Date: Mon, 22 Jul 2013 11:23:44 +0200 Subject: [PATCH 1/3] s/4/A/ + add other-file From: Thomas Rast <trast@xxxxxxxxxxx> --- file | 2 +- other-file | 0 [...] > > > ++ }" <mbox >mbox.from && > > ++ git range-diff mode-only-change..topic mbox:./mbox.from > > >actual.from && > > ++ test_cmp expect actual.from && > > ++ > > ++ append_cr <mbox >mbox.cr && > > ++ test_must_fail git range-diff \ > > ++ mode-only-change..topic mbox:./mbox.cr 2>err && > > ++ grep CR/LF err && > > Thanks for adding that Thank you for the suggestion to add it! And thank you again for modeling how to perform actionable, helpful and productive code reviews, Dscho