Hi!
On 9/21/21 6:06 PM, наб wrote:
Hi!
On Tue, Sep 21, 2021 at 05:20:32PM +0200, Alejandro Colomar (man-pages) wrote:
Are you sure?
So, it seems to me that by using {yes,no}expr and not {yes,no}str, it is
limiting itself to the first letter, as the current BUGS section specifies.
Right?
Quite sure:
localedata/locales/am_ET:yesexpr "^([+1yY<U12CE>]|<U12A0><U12CE><U1295>)"
Granted, I, unfortunately, don't strictly read Aramaic
(but a cursory glance at a dictionary shows "አዎን" means yes),
but I do Ukrainian:
localedata/locales/uk_UA:yesexpr "^([+1Yy]|[<U0422><U0442>][<U0410><U0430>][<U041A><U043A>]?)$"
which works out to
"^([+1Yy]|[Тт][Аа][Кк]?)$"
This is odd, data-wise, but it's decidedly not just the first letter
(but it does match, what, "^y$", "^та$", and "^так$"? very odd!!).
On current glibc, if I was in a uk_UA locale,
"nyes" is -1, not 0 like this page would lead me to believe,
and, similarly, in an_ET, "አ" (-1) is not the same as "አዎን" (1).
FreeBSD (and, presumably, everyone else) uses CLDR data,
which provides something much more sensible:
[1] ^(([yY]([eE][sS])?)|([yY]))
[2] ^(([дД]([аА])?)|([дД])|([yY]([eE][sS])?)|([yY]))
This, admittedly, is not perfect, but the code that generates it [3]
explicitly handles full yesstr words because the data itself [4] is
constructed around yesstr, and yesexpr is a generated expression that
matches yesstr ‒ they're the same.
rpmatch() is a correct (well, /the/ correct) approach to handling this
(or, well, an equivalent on libcs that lack it, it's like seven lines) ‒
if a similar warning were prudent, and I very much believe it is /not/,
it'd belong in nl_langinfo() {YES,NO}EXPR or langinfo.h,
but it'd be a warning /for the end-user/, who, presumably,
knows the language they speak, not for the programmer.
So, it seems that some locales try to do some extra work, and Ukrainian
seems to be doing a good job. I had a bit of bad luck with the Spanish
one... However, it seems that the C locale is also unfortunate:
user@sqli:~/src/test$ cat rpmatch.c
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
const char *str;
str = "ynever; don't even think about it!";
printf("%s;; %i;; %s\n", setlocale(LC_MESSAGES, NULL), rpmatch(str), str);
return 0;
}
user@sqli:~/src/test$ cc -Wall -Wextra -Werror rpmatch.c
user@sqli:~/src/test$ ./a.out
C;; 1;; ynever; don't even think about it!
Since the C locale is the most important one, IMHO, and it is as
problematic as the BUGS section mentions, I think we should keep the
warning, and maybe add a mention that it depends on the locale. What do
you think?
Thanks,
Alex
наб
1. https://github.com/freebsd/freebsd-src/blob/373ffc62c158e52cde86a5b934ab4a51307f9f2e/share/msgdef/en_US.UTF-8.src
2. https://github.com/freebsd/freebsd-src/blob/373ffc62c158e52cde86a5b934ab4a51307f9f2e/share/msgdef/ru_RU.UTF-8.src
3. https://github.com/unicode-org/cldr/blob/62c90a357dc25911db60fcdf7d5a80119df27963/tools/cldr-code/src/main/java/org/unicode/cldr/posix/POSIXUtilities.java#L336
4. https://github.com/unicode-org/cldr/blob/62c90a357dc25911db60fcdf7d5a80119df27963/common/main/ru.xml#L15789
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/