Re: [PATCH] range-diff: support reading mbox files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dscho

Having slept on it I think I misunderstood what was happening in the diff parsing yesterday

On 16/11/2022 14:40, Phillip Wood wrote:
+        } else if (state == MBOX_IN_DIFF) {
+            switch (line[0]) {
+            case '\0':
+                continue; /* ignore empty lines after diff */
+            case '+':
+            case '-':
+            case ' ':
+                if (!old_count && !new_count)
+                    break;

This shouldn't happen in a well formed diff. Below we happily accept bad counts, is there a reason to reject them here?

I think this might be picking up the "--" at the end of the patch as we don't want to break here at the end of a hunk. If so then a comment would be helpful.

+                if (old_count && line[0] != '+')
+                    old_count--;
+                if (new_count && line[0] != '-')
+                    new_count--;

The diff is malformed if old_count == 0 and we see '-' or ' ' or new_count == 0 and we see '+' or ' '. The code is careful not to decrement the count in that case so I think it is harmless to accept diffs with bad line counts in the hunk header.
+                /* fallthrough */
+            case '\\':
+                strbuf_addstr(&buf, line);
+                strbuf_addch(&buf, '\n');
+                util->diffsize++;

I think this might be a better place to break if old_count and new_count are both zero.

It would be the right place to break at the end of each hunk, but I don't think we want to do that.

+                continue;
+            case '@':
+                if (parse_hunk_header(line, &old_count,
+                              &new_count, &p))
+                    break;
+
+                strbuf_addstr(&buf, "@@");
+                if (current_filename && *p)
+                    strbuf_addf(&buf, " %s:",
+                            current_filename);
+                strbuf_addstr(&buf, p);
+                strbuf_addch(&buf, '\n');
+                util->diffsize++;
+                continue;
+            }

This is effectively the `default:` clause as it is executed when we don't handle the line above. We ignore the contents of this line which makes me wonder what happens if it is the start of another diff.

We'll pick that up earlier with "if (starts_with(line, "diff --git"))"

We only get here at the end of a patch (assuming it has the "--" line from format-patch)

Best Wishes

Phillip

Do we have tests that alter more than one file in a single commit?

I think this is a useful addition, it could perhaps benefit from more tests though. Having tests for bad input, "\r\n" line endings and getting the author from a From: header as well as an in-body From: line would give a bit more reassurance about how robust the parser is.

Best Wishes

Phillip



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux