Re: [PATCH] range-diff: support reading mbox files

Phillip Wood <phillip.wood123@xxxxxxxxx> · Thu, 17 Nov 2022 10:26:07 +0000

Hi Dscho

Having slept on it I think I misunderstood what was happening in the 
diff parsing yesterday

On 16/11/2022 14:40, Phillip Wood wrote:
+        } else if (state == MBOX_IN_DIFF) {
+            switch (line[0]) {
+            case '\0':
+                continue; /* ignore empty lines after diff */
+            case '+':
+            case '-':
+            case ' ':
+                if (!old_count && !new_count)
+                    break;

This shouldn't happen in a well formed diff. Below we happily accept bad 
counts, is there a reason to reject them here?

I think this might be picking up the "--" at the end of the patch as we 
don't want to break here at the end of a hunk. If so then a comment 
would be helpful.

+                if (old_count && line[0] != '+')
+                    old_count--;
+                if (new_count && line[0] != '-')
+                    new_count--;

The diff is malformed if old_count == 0 and we see '-' or ' ' or 
new_count == 0 and we see '+' or ' '. The code is careful not to 
decrement the count in that case so I think it is harmless to accept 
diffs with bad line counts in the hunk header.
+                /* fallthrough */
+            case '\\':
+                strbuf_addstr(&buf, line);
+                strbuf_addch(&buf, '\n');
+                util->diffsize++;

I think this might be a better place to break if old_count and new_count 
are both zero.

It would be the right place to break at the end of each hunk, but I 
don't think we want to do that.

+                continue;
+            case '@':
+                if (parse_hunk_header(line, &old_count,
+                              &new_count, &p))
+                    break;
+
+                strbuf_addstr(&buf, "@@");
+                if (current_filename && *p)
+                    strbuf_addf(&buf, " %s:",
+                            current_filename);
+                strbuf_addstr(&buf, p);
+                strbuf_addch(&buf, '\n');
+                util->diffsize++;
+                continue;
+            }

This is effectively the `default:` clause as it is executed when we 
don't handle the line above. We ignore the contents of this line which 
makes me wonder what happens if it is the start of another diff.

We'll pick that up earlier with "if (starts_with(line, "diff --git"))"

We only get here at the end of a patch (assuming it has the "--" line 
from format-patch)

Best Wishes

Phillip

Do we 
have tests that alter more than one file in a single commit?

I think this is a useful addition, it could perhaps benefit from more 
tests though. Having tests for bad input, "\r\n" line endings and 
getting the author from a From: header as well as an in-body From: line 
would give a bit more reassurance about how robust the parser is.

Best Wishes

Phillip