On Mon, Nov 26, 2018 at 05:19:45PM -0500, Gabriel Krisman Bertazi wrote: > +static int utf8_casefold(const struct nls_table *table, > + const unsigned char *str, size_t len, > + unsigned char *dest, size_t dlen) > +{ > + const struct utf8data *data = utf8nfkdicf(UNICODE_AGE(10,0,0)); > + struct utf8cursor cur; > + size_t nlen = 0; > + > + if (utf8ncursor(&cur, data, str, len) < 0) > + goto invalid_seq; > + > + for (nlen = 0; nlen < dlen; nlen++) { > + dest[nlen] = utf8byte(&cur); > + if (!dest[nlen]) > + return nlen; > + if (dest[nlen] == -1) > + break; > + } > +invalid_seq: > + /* Treat the sequence as a binary blob. */ > + memcpy(dest, str, len); > + return len; > + > +} So it looks like the interface is if the destination buffer is too small OR if the string is not a valid UTF-8 string, we treat it as a binary blob. I wonder if we would be better off if this function actually signalling that there is a problem? (Buffer too small, invalid UTF-8 string). It's fine to treat it as a binary blob, and copy it out to the destination buffer, but I can imagine be use cases where knowing this will be useful. *Especially* the destination buffer too small case; I'm actually a little nervous about having it silently ignoring that error condition and just copying the binary blob. Also, there *really* needs to be a check before dlen is assumed to be >= len in the memcpy after the invalid_seq label. - Ted