"Richard W.M. Jones" <rjones@xxxxxxxxxx> wrote: > Jim Meyering wrote: >> "Richard W.M. Jones" <rjones@xxxxxxxxxx> wrote: >> >>> Jim Meyering wrote: >>>> "Richard W.M. Jones" <rjones@xxxxxxxxxx> wrote: >>>>> Jim Meyering wrote: >>>>>> What do you think of using this? >>>>>> >>>>>> isascii (*p) && isalnum (*p) >>>>> I'm not sure I'm qualified to say what this does on EBCDIC, but quite >>>>> likely lots of other code breaks there too anyway. This is nicely >>>>> self-documenting anyway. >>>> As Daniel suggested, isalnum is locale-sensitive. >>>> If there's a locale with an alphabetic byte that is outside >>>> the logical a-zA-Z range, yet still within 0..127, then the above >>>> expression will give a false-positive for that byte. >>>> >>>> I've been inclined to stop worrying about EBCDIC for years, but a quick >>>> search on the web finds that people are still stuck using it, and do >>>> report bugs in ASCII-assuming code. >>>> >>>> This is why autoconf goes to the trouble of doing this: >>>> tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ >>>> not this: >>>> tr a-z A-Z >>>> to convert to upper case. >>> Another factor to consider here is that it doesn't matter if this >>> function over-escapes, but it does matter if the function >>> under-escapes. That is to say, it could escape every character as a >>> %xx hex code, which would be ugly and slightly inefficient but not >>> wrong. >> >> IMHO, if you don't use the all-enumerating switch-based code that Daniel >> objects to, it'd be good to document (in both loops) that the test is >> inaccurate with EBCDIC, and explain why it's ok to get false positives. >> >> Without comments, people might be tempted to use a similar test in a >> context where the differences matter. > > OK, how about this? > > Rich. > > + for (p = str; *p; ++p) { > + /* Want to escape only A-Z and 0-9. This may not work on > EBCDIC. */ > + if (isascii (*p) && isalnum (*p)) Actually, with that, the code is at the mercy of locale definitions, (which are notoriously unreliable), and it probably works with EBCDIC. I wrote the following and tested a few systems: #include <ctype.h> #include <stdio.h> #include <locale.h> int is_alphanum (char c) { switch (c) { /* generated by LC_ALL=C perl -e \ "print map {qq(case '\$_': )}('a'..'z', 'A'..'Z', '0'..'9')"|fmt */ case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': case 's': case 't': case 'u': case 'v': case 'w': case 'x': case 'y': case 'z': case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': case 'S': case 'T': case 'U': case 'V': case 'W': case 'X': case 'Y': case 'Z': case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': return 1; default: return 0; } } int main () { setlocale (LC_ALL, ""); for (unsigned int i = 0; i < 256; i++) if (isalnum (i) && isascii (i) && !is_alphanum (i)) printf ("%d: %c", i, i); return 0; } ------------------------------- I compiled and ran it against all installed locales like this: gcc -o k k.c && for i in $(locale -a); do \ test "$(LC_ALL=$i ./k|wc -l)" = 0 || echo $i;done On RHEL4, RHEL5, and rawhide, it finds this exception: vi_VN.tcvn Running manually in that locale suggests something is fishy: $ LC_ALL=vi_VN.tcvn ./k 1: 2: 4: 5: 6: 17: 18: 19: 20: 21: 22: 23: Surprise, surprise... So in this locale, using "isascii (*p) && isalnum (*p)" would *under*quote. I didn't expect to find such a convincing argument. I stand by my suggestion to use the switch statement. -- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list