On Tue, May 31, 2016 at 3:45 AM, Eric Wong <e@xxxxxxxxx> wrote: > Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote: >> On Mon, May 30, 2016 at 7:21 PM, Eric Wong <e@xxxxxxxxx> wrote: >> > + if (pp->fmt == CMIT_FMT_MBOXRD && >> > + !regexec(mboxrd_from, line, 0, 0, 0)) >> > + strbuf_addch(sb, '>'); >> >> At first glance, this seems dangerous since it's handing 'line' to >> regexec() without paying attention to 'linelen'. For an arbitrary >> regex, this could result in erroneous matches on subsequent "lines", >> however, since this expression is anchored with '^', it's not a >> problem. But, it is a bit subtle. > > Maybe having more context of the pp_remainder function and > seeing the get_one_line call would've helped in the diff; > I didn't think of this issue once I figured out where to > place the change. No, extra context wouldn't have helped. The problem is that get_one_line() merely returns the length of the line, which might be where the NUL-terminator is or it might be where the next newline is. Therefore, you can't rely upon the "current line" being NUL-terminated. So, in general, it's not "safe" to pass it to a function expecting the "line" to end at a NUL-terminator. > On the other hand, not being too familiar with git C APIs, I was > more worried strbuf was not NUL-terminated for regexec, but it > seems to be. Yes, that's a guarantee, but it doesn't help in this case. Given line="foo\nbar", get_one_line(line) will return 4, the length of "foo\n", but regexec() won't know to stop looking until it hits the NUL after the 'r'. An arbitrary regex, such as /bar/ will match beyond what get_one_line() considers the end-of-line, which is why this code looks scary (wrong) at first glance. >> I wonder if hand-coding, rather than using a regex, could be an improvement: >> >> static int is_mboxrd_from(const char *s, size_t n) >> { >> size_t f = strlen("From "); >> const char *t = s + n; >> >> while (s < t && *s == '>') >> s++; >> return t - s >= f && !memcmp(s, "From ", f); >> } >> >> or something. > > Yikes. I mostly work in high-level languages and do my best to > avoid string parsing in C; so that scares me. A lot. The hand-coded is_mboxrd_from() above is pretty much idiomatic C and (I think) typical of how such a function would be coded in Git itself, so it looks normal and easy to grok to me (but, of course, I'm probably biased since I wrote it). > I admit a regex isn't necessary, but I'm wondering if the above > could be made less frightening to someone like me. Perhaps, but it's difficult to say without knowing how it frightens you. > Maybe extra test cases + valgrind can quell my fears :) I can envision tests such as: "" "F" "From" "From " "From " "From foobar" and so on, if that's what you mean. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html