On Tue, Oct 03, 2017 at 02:50:42AM +0200, Adam Borowski wrote: > Anything with bytes 1-31,127 will get -EACCES. > > Especially \n is bad: instead of natural file-per-line, you need an > user-unfriendly feature of -print0 added to every producer and consumer; > a good part of users either don't know or don't feel the need to bother > with escaping this snowflake, thus introducing security holes. > > The rest of control characters, while not as harmful, don't have a > legitimate use nor have any real chance of coming from a tarball (malice > and fooling around excluded). No character set ever supported as a system > locale by glibc, and, TTBMK, by other popular Unices, includes them, thus > it can be assumed no foreign files have such names other than artificially. > > This goes in stark contrast with other characters proposed to be banned: > non-UTF8 is common, and even on my desktop's disk I found examples of all > of: [ ], < >, initial -, initial and final space, ?, *, .., ', ", |, &. > Somehow no \ anywhere. I think I have an idea why no / . > > Another debatable point is whether to -EACCES or to silently rename to an > escaped form such as %0A. I believe the former is better because: > * programs can be confused if a directory has files they didn't just write > * many filesystems already disallow certain characters (like invalid > Unicode), thus returning an error is consistent > > An example of a write-up of this issue can be found at: > https://www.dwheeler.com/essays/fixing-unix-linux-filenames.html That essay is full of shit, and you've even mentioned parts of that just above... NAK; you'd _still_ need proper quoting (or a shell with something resembling an actual syntax, rather than the "more or less what srb had ended up implementing"), so it doesn't really buy you anything. Badly written script will still be exploitable. And since older kernels and other Unices are not going away, you would've created an inconsistently vulnerable set of scripts, on top of the false sense of security.