On Tue, Oct 03, 2017 at 03:07:24AM +0100, Al Viro wrote: > On Tue, Oct 03, 2017 at 02:50:42AM +0200, Adam Borowski wrote: > > Anything with bytes 1-31,127 will get -EACCES. > > > > Especially \n is bad: instead of natural file-per-line, you need an > > user-unfriendly feature of -print0 added to every producer and consumer; > > a good part of users either don't know or don't feel the need to bother > > with escaping this snowflake, thus introducing security holes. > > > > The rest of control characters, while not as harmful, don't have a > > legitimate use nor have any real chance of coming from a tarball (malice > > and fooling around excluded). No character set ever supported as a system > > locale by glibc, and, TTBMK, by other popular Unices, includes them, thus > > it can be assumed no foreign files have such names other than artificially. > > > > This goes in stark contrast with other characters proposed to be banned: > > non-UTF8 is common, and even on my desktop's disk I found examples of all > > of: [ ], < >, initial -, initial and final space, ?, *, .., ', ", |, &. > > Somehow no \ anywhere. I think I have an idea why no / . > > > > Another debatable point is whether to -EACCES or to silently rename to an > > escaped form such as %0A. I believe the former is better because: > > * programs can be confused if a directory has files they didn't just write > > * many filesystems already disallow certain characters (like invalid > > Unicode), thus returning an error is consistent > > > > An example of a write-up of this issue can be found at: > > https://www.dwheeler.com/essays/fixing-unix-linux-filenames.html > > That essay is full of shit, and you've even mentioned parts of that just > above... I used it as a list of problems, not solutions. > NAK; you'd _still_ need proper quoting (or a shell with something resembling an > actual syntax, rather than the "more or less what srb had ended up implementing"), > so it doesn't really buy you anything. Well, what about just \n then? Unlike all the others which are relatively straightforward, \n requires -print0 which not all programs implement, and way too many people consider too burdensome to use. > Badly written script will still be exploitable. Yeah, but we'd kill a major exploit avenue. > And since older kernels and other Unices are not going away, you would've > created an inconsistently vulnerable set of scripts, on top of the false > sense of security. That shouldn't stop us from improving new kernels -- scripts that have -print0 won't lose it, those that don't will have a vulnerability fixed. Same as with any other kind of hardening. As for other Unices: Theo de Raadt is not someone to object to a trivial security patch, FreeBSD would follow, OSX is too hostile to developers for me to care. Thus, the only concern is new userland on old kernels. But distributions don't support such combinations for long, unlike the other way around. As for people writing their own scripts: they already tend to be vulnerable. I for example, when writing an ad-hoc pipeline, tend to first make it display files that'd be processed; switching that to -print0 back and forth would be really tedious thus I usually remain vulnerable to \n (unless the script is meant for external use -- but it's too easy to forget). And how do you propose to process a list of files with grep or sed if there are newlines involved? Basic quotes make it trivial to handle everything but two snowflakes: \n and initial -; the latter you need to remember about but ./* or -- aren't hard. This leaves \n. Thus, would you consider banning just newlines? Meow! -- ⢀⣴⠾⠻⢶⣦⠀ We domesticated dogs 36000 years ago; together we chased ⣾⠁⢰⠒⠀⣿⡁ animals, hung out and licked or scratched our private parts. ⢿⡄⠘⠷⠚⠋⠀ Cats domesticated us 9500 years ago, and immediately we got ⠈⠳⣄⠀⠀⠀⠀ agriculture, towns then cities. -- whitroth on /.