Hi Darren, Darren Tucker wrote on Wed, Jan 20, 2016 at 09:39:33AM +1100: > On Wed, Jan 20, 2016 at 8:48 AM, Ingo Schwarze <schwarze@xxxxxxx> wrote: >> Martijn sent the following patch to me in private and agreed that i >> post it here. >> >> In any other program in OpenBSD base, i'd probably agree with the >> basic approach. Regarding OpenSSH, however, i worry whether wcwidth(3) >> can be used. While wcwidth(3) is POSIX, it is not ISO C. Does >> OpenSSH target platforms that don't provide wcwidth(3)? > OpenSSH nominally targets POSIX, but it builds on a wide enough range > of platforms that it's likely at least some don't have it. > > Our general approach is to target POSIX then implement any needed > missing bits either by stealing the implementation from OpenBSD, some > other BSD licensed source or writing from scratch. If we have to > we'll ifdef stuff but prefer not to. Sure, that's what i expected. >> If so, do you think the problem can be solved by simply providing >> US-ASCII support only on such platforms, but no UTF-8 support at all? > Yes. That's what I did with mblen when we picked up a need for that > via libedit for platforms with no wide character support. > > $ grep -i mblen openbsd-compat/*.h > openbsd-compat/openbsd-compat.h:#ifndef HAVE_MBLEN > openbsd-compat/openbsd-compat.h:# define mblen(x, y) (1) Uh oh. I'm not quite sure what consequences that might entail in libedit for sftp(1), which does use setlocale(LC_CTYPE, "")? Did you audit those consequences? > Is there any reason the same approach would not work with wcwidth? #define wcwidth(x) (1) /* NO!! */ would be a security risk. One purpose of wcwidth(3) is to weed out non-printable characters. Whatever replacement we use, we have to make sure it returns -1 for every non-printable character on every platform. We MUST NOT let the scp(1) progressmeter spew random Unicode characters taken from the network at the user's terminal. They might be control codes. #define wcwidth(x) (-1) /* not really */ is not a security issue, but it would completely break filename display even with the C/POSIX locale on those platforms. I briefly considered int wcwidth(wchar_t wc) /* might break? */ { if (wc < 0x20 || wc > 0x7e) return -1; return isprint((unsigned char)wc) ? 1 : -1; } But that isn't ideal either because as far as i know, ISO C doesn't require that wchar_t is internally represented in a way that puts ASCII in the range 0x00 to 0x7f. Using iswprint(3) is not a very good idea either because that is C99, not C89, and may not be available either. So if we can't get a real implementation of wcwidth(3) on some platform, it's better to completely disable UTF-8 and only allow US-ASCII. A real replacement implementation of wcwidth(3) is MUCH harder than a real replacement implementation of mbtowc(3) and mblen(3). It needs a big table of character ranges (see tmux(1)), while the latter can be done in 50 lines (see mandoc(1)). That's why i said: If we want full UTF-8 support on all platforms no matter what and must have a complete replacement wcwidth(3), we should complete djm@'s character filtering first. >> P.S. >> This patch also uses mbtowc(3), but i assume that's no problem >> because that's ANSI C. > I would not assume that its existence in the standard is equal to its > existence in all deployments :-) That said it looks like we can > implement it in libcompat if needed. Yes, mbtowc(3) and mblen(3), certainly, but even those only for UTF-8, not for any other locale. Yours, Ingo _______________________________________________ openssh-unix-dev mailing list openssh-unix-dev@xxxxxxxxxxx https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev