Re: [PATCH 1/2] rpmatch.3: remove first-character-only FUD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

On 9/21/21 6:06 PM, наб wrote:
Hi!

On Tue, Sep 21, 2021 at 05:20:32PM +0200, Alejandro Colomar (man-pages) wrote:
Are you sure?

So, it seems to me that by using {yes,no}expr and not {yes,no}str, it is
limiting itself to the first letter, as the current BUGS section specifies.
Right?
Quite sure:
	localedata/locales/am_ET:yesexpr "^([+1yY<U12CE>]|<U12A0><U12CE><U1295>)"
Granted, I, unfortunately, don't strictly read Aramaic
(but a cursory glance at a dictionary shows "አዎን" means yes),
but I do Ukrainian:
	localedata/locales/uk_UA:yesexpr "^([+1Yy]|[<U0422><U0442>][<U0410><U0430>][<U041A><U043A>]?)$"
which works out to
	"^([+1Yy]|[Тт][Аа][Кк]?)$"

This is odd, data-wise, but it's decidedly not just the first letter
(but it does match, what, "^y$", "^та$", and "^так$"? very odd!!).

On current glibc, if I was in a uk_UA locale,
"nyes" is -1, not 0 like this page would lead me to believe,
and, similarly, in an_ET, "አ" (-1) is not the same as "አዎን" (1).

FreeBSD (and, presumably, everyone else) uses CLDR data,
which provides something much more sensible:
   [1] ^(([yY]([eE][sS])?)|([yY]))
   [2] ^(([дД]([аА])?)|([дД])|([yY]([eE][sS])?)|([yY]))

This, admittedly, is not perfect, but the code that generates it [3]
explicitly handles full yesstr words because the data itself [4] is
constructed around yesstr, and yesexpr is a generated expression that
matches yesstr ‒ they're the same.

rpmatch() is a correct (well, /the/ correct) approach to handling this
(or, well, an equivalent on libcs that lack it, it's like seven lines) ‒
if a similar warning were prudent, and I very much believe it is /not/,
it'd belong in nl_langinfo() {YES,NO}EXPR or langinfo.h,
but it'd be a warning /for the end-user/, who, presumably,
knows the language they speak, not for the programmer.

So, it seems that some locales try to do some extra work, and Ukrainian seems to be doing a good job. I had a bit of bad luck with the Spanish one... However, it seems that the C locale is also unfortunate:


user@sqli:~/src/test$ cat rpmatch.c
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
	const char *str;

	str = "ynever; don't even think about it!";
	printf("%s;; %i;; %s\n", setlocale(LC_MESSAGES, NULL), rpmatch(str), str);
	return 0;
}
user@sqli:~/src/test$ cc -Wall -Wextra -Werror rpmatch.c
user@sqli:~/src/test$ ./a.out
C;; 1;; ynever; don't even think about it!


Since the C locale is the most important one, IMHO, and it is as problematic as the BUGS section mentions, I think we should keep the warning, and maybe add a mention that it depends on the locale. What do you think?

Thanks,

Alex


наб

1. https://github.com/freebsd/freebsd-src/blob/373ffc62c158e52cde86a5b934ab4a51307f9f2e/share/msgdef/en_US.UTF-8.src
2. https://github.com/freebsd/freebsd-src/blob/373ffc62c158e52cde86a5b934ab4a51307f9f2e/share/msgdef/ru_RU.UTF-8.src
3. https://github.com/unicode-org/cldr/blob/62c90a357dc25911db60fcdf7d5a80119df27963/tools/cldr-code/src/main/java/org/unicode/cldr/posix/POSIXUtilities.java#L336
4. https://github.com/unicode-org/cldr/blob/62c90a357dc25911db60fcdf7d5a80119df27963/common/main/ru.xml#L15789



--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/



[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux