On Mon, Apr 01, 2019 at 01:09:47AM +0200, SZEDER Gábor wrote: > On Mon, Apr 01, 2019 at 12:01:04AM +0200, Andrei Rybak wrote: > > diff --git a/mailinfo.c b/mailinfo.c > > index b395adbdf2..4ef6cdee85 100644 > > --- a/mailinfo.c > > +++ b/mailinfo.c > > @@ -701,6 +701,13 @@ static int is_scissors_line(const char *line) > > c++; > > continue; > > } > > + if (!memcmp(c, "✂", 3)) { > > This character is tiny. Please add a comment that it's supposed to be > a Unicode scissors character. I think it might also be the first raw UTF-8 character in our source, which is otherwise ASCII. Usually we'd spell out the binary (with a comment). I think I agree with Junio's response, tough, that this is probably not a road we want to go down, unless this micro-format is being actively used in the wild (I have no idea, but I have never seen it). > Should we worry about this memcmp() potentially reading past the end > of the string when 'c' points to the last character? I also wondered if the existing memcmps for ">8", etc, would have this problem. They don't, but it's somewhat subtle. They are only 2 characters long, and the outer loop guarantees we have at least 1 character. So at most we will look at the NUL. But obviously a 3-byte sequence like this may invoke undefined behavior, and the existing memcmps encourage anybody adding code to do it wrong. I wonder if it's worth re-writing it like: diff --git a/mailinfo.c b/mailinfo.c index b395adbdf2..46b1b2a4a8 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -693,8 +693,8 @@ static int is_scissors_line(const char *line) perforation++; continue; } - if ((!memcmp(c, ">8", 2) || !memcmp(c, "8<", 2) || - !memcmp(c, ">%", 2) || !memcmp(c, "%<", 2))) { + if ((starts_with(c, ">8") || starts_with(c, "8<") || + starts_with(c, ">%") || starts_with(c, "%<"))) { in_perforation = 1; perforation += 2; scissors += 2; -Peff