Re: [PATCH] mailsplit and mailinfo: gracefully handle NUL characters

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Fri, 16 May 2008 15:29:26 +0100 (BST)

Hi,

On Fri, 16 May 2008, Brian Foster wrote:

> Johannes Schindelin suggested:
> > The function fgets() has a big problem with NUL characters: it reads
> > them, but nobody will know if the NUL comes from the file stream, or
> > was appended at the end of the line.
> >
> > So implement a custom read_line() function.
>                         ^^^^^^^^^^^
>                         read_line_with_nul()
> meaning read part or all of one line which may contain NULs.

Right.

> >[ ... ]
> > diff --git a/builtin-mailsplit.c b/builtin-mailsplit.c
> > index 46b27cd..021dc16 100644
> > --- a/builtin-mailsplit.c
> > +++ b/builtin-mailsplit.c
> > @@ -45,6 +45,25 @@ static int is_from_line(const char *line, int len)
> >  /* Could be as small as 64, enough to hold a Unix "From " line. */
> >  static char buf[4096];
> >
> > +/* We cannot use fgets() because our lines can contain NULs */
> > +int read_line_with_nul(char *buf, int size, FILE *in)
> > +{
> > +     int len = 0, c;
> > +
> > +     for (;;) {
> > +             c = fgetc(in);
> > +             buf[len++] = c;
> > +             if (c == EOF || c == '\n' || len + 1 >= size)
> > +                     break;
> > +     }
> > +
> > +     if (c == EOF)
> > +             len--;
> > +     buf[len] = '\0';
> > +
> > +     return len;
> 
>  when fgetc(3) — why not use getc(3)? -

Because mailsplit can read from a file, too.

>  returns EOF it is pointlessly stored in buf[] (as a 'char'!), len's 
>  advanced, and then the storage and advancing are undone.  isn't that a 
>  bit silly?

I left it at that, because it is a rare case, the buffer has to be 
accessed with the trailing NUL anyway, and I think it is worth to have 
this function quite readable.  I, for one, am pretty certain that I 
understand what this function does, and how, in 6 months from now, without 
any additional documentation.

>  untested:
> 
> 	assert(2 <= size);
> 	do {
> 		if ((c = getc(in)) == EOF)
> 			break;
> 	} while (((buf[len++] = c) != '\n' && len+1 < size);
> 	buf[len] = '\0'
> 
> 	return len;

... except this is unreadable at best ;-)

>  I'd tend to write this in terms of pointers,
>  something along the lines (untested):
> 
> 	char	*p, *endp;
> 
> 	assert(1 <= size);
> 	p    = buf;
> 	endp = p + (size-1);
> 	while (p < endp) {
> 		if ((c = getc(in)) == EOF || (*p++ = c) == '\n')
> 			break;
> 	}
> 	*p = '\0';
> 
> 	return p - buf;

Again, I think this is too cuddled.  You have to think about every second 
line, and that makes for stupid mistakes with this developer.

Ciao,
Dscho