Re: [PATCH 1/2] libgit.a: add some UTF-8 handling functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Johannes,

Johannes Schindelin wrote:
> On Fri, 22 Dec 2006, Junio C Hamano wrote:
> 
> > Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
> > 
> > > This adds utf8_byte_count(), utf8_strlen() and print_wrapped_text().
> > >
> > > The most important is probably utf8_strlen(), which returns the length
> > > of the text, if it is in UTF-8, otherwise -1.
> > >
> > > Note that we do not go the full nine yards: we could also check that
> > > the character is encoded with the minimum amount of bytes, as pointed
> > > out by Uwe Kleine-Koenig.
> > >
> > > The function print_wrapped_text() can be used to wrap text to a certain
> > > line length.
> > 
> > If you do wrapped_text, I think you do not _want_ strlen (the
> > definition to me of strlen is "number of characters in the
> > string").  What you want is a function that returns the number
> > of columns consumed when displayed on monospace terminal.
> 
> To me, characters are the symbols occupying one "column" each. Bytes are 
> the 8-bit thingies that you usually use to encode the characters.
Quoting utf-8(7):

	are no longer valid in UTF-8 locales.  Firstly, a single byte
	does not necessarily correspond any more to a single character.
	Secondly, since modern terminal emulators in UTF-8 mode also
	support Chinese, Japanese, and Korean double-width characters as
	well as non-spacing combining characters, outputting a single
	character does not necessarily advance the cursor by one
	position as it did in ASCII.  Library functions such as
	mbsrtowcs(3) and wcswidth(3) should be used today to count
	characters and cursor positions.

I'd prefer using a similar naming scheme.  To acknowledge Junio,
wcslen(3) (the wide-character equivalent of the strlen() function)
counts the number of (wide-)characters in a string.

Best regards,
Uwe

-- 
Uwe Kleine-König

http://www.google.com/search?q=e+%5E+%28i+pi%29
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]