Hello Johannes, Johannes Schindelin wrote: > On Fri, 22 Dec 2006, Junio C Hamano wrote: > > > Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > > > > > This adds utf8_byte_count(), utf8_strlen() and print_wrapped_text(). > > > > > > The most important is probably utf8_strlen(), which returns the length > > > of the text, if it is in UTF-8, otherwise -1. > > > > > > Note that we do not go the full nine yards: we could also check that > > > the character is encoded with the minimum amount of bytes, as pointed > > > out by Uwe Kleine-Koenig. > > > > > > The function print_wrapped_text() can be used to wrap text to a certain > > > line length. > > > > If you do wrapped_text, I think you do not _want_ strlen (the > > definition to me of strlen is "number of characters in the > > string"). What you want is a function that returns the number > > of columns consumed when displayed on monospace terminal. > > To me, characters are the symbols occupying one "column" each. Bytes are > the 8-bit thingies that you usually use to encode the characters. Quoting utf-8(7): are no longer valid in UTF-8 locales. Firstly, a single byte does not necessarily correspond any more to a single character. Secondly, since modern terminal emulators in UTF-8 mode also support Chinese, Japanese, and Korean double-width characters as well as non-spacing combining characters, outputting a single character does not necessarily advance the cursor by one position as it did in ASCII. Library functions such as mbsrtowcs(3) and wcswidth(3) should be used today to count characters and cursor positions. I'd prefer using a similar naming scheme. To acknowledge Junio, wcslen(3) (the wide-character equivalent of the strlen() function) counts the number of (wide-)characters in a string. Best regards, Uwe -- Uwe Kleine-König http://www.google.com/search?q=e+%5E+%28i+pi%29 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html