Re: [PATCH] utf8.c: fix strbuf_utf8_replace copying the last NUL to dst string

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Duy Nguyen <pclouds@xxxxxxxxx> writes:

>> > it returns 0 and steps 'src' by one. 
>> 
>> Here "it" refers to utf8_width()?  Who steps 'src' by one?
>
> utf8_width() steps 'src'.
>
>> 
>> Ahh, did you mean *src == NUL, i.e. "already at the end of the
>> string"?
>
> Yes.. I guess you have a better commit message prepared for me now :)

Heh.  At least you realized that the log message was uninformative ;-)

>> I think utf8_width() called with an empty string should not move the
>> pointer past that end-of-string NUL in the first place.  It makes me
>> wonder if it would be a better fix to make it not to do that (and
>> return 0), but if we declare it is the caller's fault, perhaps we
>> may want to add
>> 
>> 	if (!**start)
>>         	die("BUG: empty string to utf8_width()???");
>> 
>> at the very beginning of utf8_width(), even before it calls
>> pick-one-utf8-char.
>> 
>> Still puzzled...
>
> I don't know. I mean, in a UTF-8 byte sequence, NUL is a valid
> character (part of the ASCII subset), so advancing '*start' by one
> makes sense.

Dubious.  NUL may be valid but it is valid only in the sense that it
will mark the end of the string, i.e. the caller is expected to do:

        const char *string = ...;

        while (not reached the end of the string) {
                this_char = pick_one_char(&string);
                do something to this_char;
        }

and there are two ways to structure such a loop:

 (1) Make pick-one-char signal the termination condition.  i.e.

	while ((this_char = pick_one(&string)) != EOF)
		do something to this_char;

     That's a typical getchar()-style loop.  It would better not
     advance string when it returns EOF.

 (2) Make it the responsibility of the caller not to call beyond the
     end. i.e.

	while (string < end) {
        	this_char = pick_one_char(&string);
		do something to this char;
	}

and your patch takes the latter, which I think is in line with the
way how existing callchain works.  So the addition of "BUG" merely
is to make sure that everybody follows that same convention.

We cannot declare that it is caller's responsibility to feed
sensible input to the function only at one callsite and allow other
callsites feed garbage to the same function, after all, no?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]