On Tue, Dec 12, 2023 at 05:12:43PM -0500, Jeff King wrote: > When processing a header like a "From" line, mailinfo uses > unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized > comments. It takes a NUL-terminated string on input, and loops over the > "in" pointer until it sees the NUL. When it finds the start of an > interesting block, it delegates to helper functions which also increment > "in", and return the updated pointer. > > But there's a bug here: the helpers find the NUL with a post-increment > in the loop condition, like: > > while ((c = *in++) != 0) > > So when they do see a NUL (rather than the correct termination of the > quote or comment section), they return "in" as one _past_ the NUL > terminator. And thus the outer loop in unquote_quoted_pair() does not > realize we hit the NUL, and keeps reading past the end of the buffer. > > We should instead make sure to return "in" positioned at the NUL, so > that the caller knows to stop their loop, too. A hacky way to do this is > to return "in - 1" after leaving the inner loop. But a slightly cleaner > solution is to avoid incrementing "in" until we are sure it contained a > non-NUL byte (i.e., doing it inside the loop body). > > The two tests here show off the problem. Since we check the output, > they'll _usually_ report a failure in a normal build, but it depends on > what garbage bytes are found after the heap buffer. Building with > SANITIZE=address reliably notices the problem. The outcome (both the > exit code and the exact bytes) are just what Git happens to produce for > these cases today, and shouldn't be taken as an endorsement. It might be > reasonable to abort on an unterminated string, for example. The priority > for this patch is fixing the out-of-bounds memory access. Makes sense. > diff --git a/mailinfo.c b/mailinfo.c > index a07d2da16d..737b9e5e13 100644 > --- a/mailinfo.c > +++ b/mailinfo.c > @@ -58,12 +58,12 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line) > > static const char *unquote_comment(struct strbuf *outbuf, const char *in) > { > - int c; > int take_next_literally = 0; > > strbuf_addch(outbuf, '('); > > - while ((c = *in++) != 0) { > + while (*in) { > + int c = *in++; > if (take_next_literally == 1) { > take_next_literally = 0; > } else { > @@ -88,10 +88,10 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in) > > static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in) > { > - int c; > int take_next_literally = 0; > > - while ((c = *in++) != 0) { > + while (*in) { > + int c = *in++; > if (take_next_literally == 1) { > take_next_literally = 0; > } else { I was wondering whether we want to convert `unquote_quoted_pair()` in the same way. It's not subject to the same issue because it doesn't return `in` to its callers. But it would lower the cognitive overhead a bit by keeping the coding style consistent. But please feel free to ignore this remark. Another thing I was wondering about is the recursive nature of these functions, and whether it can lead to similar issues like we recently had when parsing revisions with stack exhaustion. I _think_ this should not be much of a problem in this case though, as we're talking about mail headers here. While the length of header values isn't restricted per se (the line length is restricted to 1000, but with Comment Folding Whitespace values can span multiple lines), but mail provdiers will sure clamp down on mails with a "From:" header that is megabytes in size. So taken together, this looks good to me. Patrick
Attachment:
signature.asc
Description: PGP signature