Re: [PATCH v2] range-diff: support reading mbox files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Phillip,

On Mon, 21 Nov 2022, Phillip Wood wrote:

> [...]
> >       @@ range-diff.c: struct patch_util {
> >        +
> >        +	line = contents.buf;
> >        +	size = contents.len;
> >       -+	for (; size > 0; size -= len, line += len) {
> >       ++	for (; size; size -= len, line += len) {
> >        +		const char *p;
> >        +
> >       -+		len = find_eol(line, size);
> >       ++		len = find_next_line(line, size);
> >       +
> >       -+		if (state == MBOX_BEFORE_HEADER) {
> >       ++		if (state == MBOX_BEFORE_HEADER ||
> >       ++		    (state == MBOX_IN_DIFF && line[0] == 'F')) {
>
> If we see an 'F' when parsing a diff then it is probably a From: line
> indicating that we've reached the start of a new patch. I think we need to add
> the diff we've just parsed to the list of commits before continuing and change
> state to MBOX_BEFORE_HEADER otherwise we'll hit the continue just below this
> comment with state unchanged.

As always, thank you *so* much for modeling how to perform productive,
actionable and *thorough* code reviews that focus on the relevant.

You spotted something important. It is a bug, as you suspected, and there
is even more: `util` leaks because it is allocated both after the patch
was parsed and when we see a diff header, without releasing the first (but
granted, this leak has been there before).

You have no idea how much I appreciate your diligence; I understand how
much of an effort it takes to study patches in depth until you understand
them enough to spot complex issues like this one. (In Parkinson's
parlance: I am grateful that you focus on the design of the nuclear power
plant, see https://en.wikipedia.org/w/index.php?title=Atwood%27s_duck)

Back to our regularly scheduled programming (pun intended): I indeed made
a mistake here and fixed it, by removing that 'F' check again, and instead
adding a `goto` label and jumping to that upon encountering the end of the
diff.

I also de-duplicated the `util = xcalloc(1, sizeof(*util))` calls so that
we do that _only_ in the `MBOX_BEFORE_HEADER` arm, and no longer at the
end of the `MBOX_IN_DIFF` arm.

Since I already looked at a leak, I looked further and saw that also
`copy` is leaked in `show_range_diff()` in case of an mbox. I've addressed
that, too.

>
> >        +			if (!skip_prefix(line, "From ", &p))
> >        +				continue;
> >        +
> >       @@ range-diff.c: struct patch_util {
> >        +			author = subject = NULL;
> >        +
> >        +			state = MBOX_IN_HEADER;
> >       ++			continue;
> >        +		}
> >        +
> >        +		if (starts_with(line, "diff --git ")) {
> >       @@ range-diff.c: struct patch_util {
> >        +			strbuf_addch(&buf, '\n');
> >        +			if (!util->diff_offset)
> >        +				util->diff_offset = buf.len;
> >       -+			line[len - 1] = '\n';
> >       ++
> >       +			orig_len = len;
> >       -+			len = parse_git_diff_header(&root, &linenr, 1,
> >       line,
> >       -+						    len, size,
> >       &patch);
> >       ++			/* `find_next_line()`'s replaced the LF with a
> >       NUL */
> >       ++			line[len - 1] = '\n';
> >       ++			len = len > 1 && line[len - 2] == '\r' ?
> >       ++				error(_("cannot handle diff headers
> >       with "
> >       ++					"CR/LF line endings")) :
> >       ++				parse_git_diff_header(&root, &linenr,
> >       1, line,
> >       ++						      len, size,
> >       &patch);
> >        +			if (len < 0) {
> >        +				error(_("could not parse git header
> >        '%.*s'"),
> >        +				      orig_len, line);
> >       ++				release_patch(&patch);
>
> Thank you for using release_patch() rather than the "free random struct
> members" approach that was suggested.

Indeed, it sounded like a specific suggestion to release a specific
`struct` member, but that was of course not the end of the story, and I
completed the analysis myself that I had wished the reviewer would have
completed _before_ sending off a review.

> >       @@ t/t3206-range-diff.sh: test_expect_success 'ranges with pathspecs'
> >       '
> >        +test_expect_success 'compare range vs mbox' '
> >        +	git format-patch --stdout topic..mode-only-change >mbox &&
> >        +	git range-diff topic...mode-only-change >expect &&
> >       ++
> >        +	git range-diff mode-only-change..topic mbox:./mbox >actual &&
> >        +	test_cmp expect actual &&
> >       -+	git range-diff mode-only-change..topic mbox:- <mbox >actual &&
> >       -+	test_cmp expect actual
> >       ++
>
> I'm a bit confused by this sed command, I've annotated it with my probably
> flawed understanding.
>
> >       ++	sed -e "/^From: .*/{
> >       ++		h
>
> This stores the From: header in the hold space

👍

> >       ++		s/.*/From: Bugs Bunny <bugs@xxxxxx>/
>
> This changes the From: header in the pattern space

👍

> >       ++		:1
> >       ++		N
> >       ++		/[ -z]$/b1
>
> We loop until we find a line that does not end with a space, letter or number
> adding the lines to the hold space

I would have _loved_ to match on an empty line, i.e. `/[^\n]$/b1`. But
that construct is not understood by the `sed` on macOS.

I even went so far as to search for the source code of a BSD `sed` (and I
found it, and modified the code so that it builds on Linux, see
https://github.com/dscho/sed-bsd) to try a couple of things, but could not
make it work with any variation of `\n`. Therefore, I settled on expecting
all the lines in the commit header to end in printable ASCII characters
between the space and the `z`.

> >       ++		G
>
> This appends the hold space to the pattern space, then the pattern space is
> printed.

👍

> Doesn't this mean we end up with two From: headers? Is the in-body From:
> line already present?

There is no in-body `From:` header because the patch author matches the
`GIT_AUTHOR_IDENT` that is in effect while running the `format-patch`
command.

Let me show you what this `sed` call deals with. In the local test run, it
modified an `mbox` starting with this:

	From 4d39cb329d3ef4c8e69b43859c2e11adb83f8613 Mon Sep 17 00:00:00 2001
	From: Thomas Rast <trast@xxxxxxxxxxx>
	Date: Mon, 22 Jul 2013 11:23:44 +0200
	Subject: [PATCH 1/3] s/4/A/ + add other-file

	---
	  file       | 2 +-
	  other-file | 0
	  [...]

to a modified `mbox` that starts with this:

	From 4d39cb329d3ef4c8e69b43859c2e11adb83f8613 Mon Sep 17 00:00:00 2001
	From: Bugs Bunny <bugs@xxxxxx>
	Date: Mon, 22 Jul 2013 11:23:44 +0200
	Subject: [PATCH 1/3] s/4/A/ + add other-file

	From: Thomas Rast <trast@xxxxxxxxxxx>
	---
	  file       | 2 +-
	  other-file | 0
	  [...]

>
> >       ++	}" <mbox >mbox.from &&
> >       ++	git range-diff mode-only-change..topic mbox:./mbox.from
> >       >actual.from &&
> >       ++	test_cmp expect actual.from &&
> >       ++
> >       ++	append_cr <mbox >mbox.cr &&
> >       ++	test_must_fail git range-diff \
> >       ++		mode-only-change..topic mbox:./mbox.cr 2>err &&
> >       ++	grep CR/LF err &&
>
> Thanks for adding that

Thank you for the suggestion to add it!

And thank you again for modeling how to perform actionable, helpful and
productive code reviews,
Dscho

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux