Hi Dscho
Having slept on it I think I misunderstood what was happening in the
diff parsing yesterday
On 16/11/2022 14:40, Phillip Wood wrote:
+ } else if (state == MBOX_IN_DIFF) {
+ switch (line[0]) {
+ case '\0':
+ continue; /* ignore empty lines after diff */
+ case '+':
+ case '-':
+ case ' ':
+ if (!old_count && !new_count)
+ break;
This shouldn't happen in a well formed diff. Below we happily accept bad
counts, is there a reason to reject them here?
I think this might be picking up the "--" at the end of the patch as we
don't want to break here at the end of a hunk. If so then a comment
would be helpful.
+ if (old_count && line[0] != '+')
+ old_count--;
+ if (new_count && line[0] != '-')
+ new_count--;
The diff is malformed if old_count == 0 and we see '-' or ' ' or
new_count == 0 and we see '+' or ' '. The code is careful not to
decrement the count in that case so I think it is harmless to accept
diffs with bad line counts in the hunk header.
+ /* fallthrough */
+ case '\\':
+ strbuf_addstr(&buf, line);
+ strbuf_addch(&buf, '\n');
+ util->diffsize++;
I think this might be a better place to break if old_count and new_count
are both zero.
It would be the right place to break at the end of each hunk, but I
don't think we want to do that.
+ continue;
+ case '@':
+ if (parse_hunk_header(line, &old_count,
+ &new_count, &p))
+ break;
+
+ strbuf_addstr(&buf, "@@");
+ if (current_filename && *p)
+ strbuf_addf(&buf, " %s:",
+ current_filename);
+ strbuf_addstr(&buf, p);
+ strbuf_addch(&buf, '\n');
+ util->diffsize++;
+ continue;
+ }
This is effectively the `default:` clause as it is executed when we
don't handle the line above. We ignore the contents of this line which
makes me wonder what happens if it is the start of another diff.
We'll pick that up earlier with "if (starts_with(line, "diff --git"))"
We only get here at the end of a patch (assuming it has the "--" line
from format-patch)
Best Wishes
Phillip
Do we
have tests that alter more than one file in a single commit?
I think this is a useful addition, it could perhaps benefit from more
tests though. Having tests for bad input, "\r\n" line endings and
getting the author from a From: header as well as an in-body From: line
would give a bit more reassurance about how robust the parser is.
Best Wishes
Phillip