在 2024-02-25星期日的 15:32 +0800,Xi Ruoyao写道: > On Sun, 2024-02-25 at 14:51 +0800, Icenowy Zheng wrote: > > > From my point of view, I prefer to "restore fstat", because we > > > need > > > to > > > use the Chrome sandbox everyday (even though it hasn't been > > > upstream > > > by now). But I also hope "seccomp deep argument inspection" can > > > be > > > solved in the future. > > > > My idea is this problem needs syscalls to be designed with deep > > argument inspection in mind; syscalls before this should be > > considered > > as historical error and get fixed by resotring old syscalls. > > I'd not consider fstat an error as using statx for fstat has a > performance impact (severe for some workflows), and Linus has > concluded Sorry for clearance, I mean statx is an error in ABI design, not fstat. > "if the user wants fstat, give them fstat" for the performance issue: > > https://sourceware.org/pipermail/libc-alpha/2023-September/151365.html > > However we only want fstat (actually "newfstat" in fs/stat.c), and it > seems we don't want to resurrect newstat, newlstat, newfstatat, etc. > (or > am I missing any benefit - performance or "just pleasing seccomp" - > of > them comparing to statx?) so we don't want to just define > __ARCH_WANT_NEW_STAT. So it seems we need to add some new #if to > fs/stat.c and include/uapi/asm-generic/unistd.h. > > And no, it's not a design issue of all other syscalls. It's just the > design issue of seccomp. There's no way to design a syscall allowing > seccomp to inspect a 100-character path in its argument unless > refactoring seccomp entirely because we cannot fit a 100-character > path > into 8 registers. Well my meaning is that syscalls should be designed to be simple to prevent this kind of circumstance. > > As at now people do use PTRACE_PEEKDATA for "deep inspection" > (actually > "debugging" the target process) but it obviously makes a very severe > performance impact. > > <rant> > > Today the entire software industry is saying "do things in a > declarative > way" but seccomp is completely the opposite. It's auditing *how* the > sandboxed application is doing things instead of *what* will be done. > > I've raised my against to seccomp and/or syscall allowlisting several > times after seeing so many breakages like: > > - https://github.com/NetworkConfiguration/dhcpcd/issues/120 > - https://gitlab.gnome.org/GNOME/tracker-miners/-/issues/252 > - https://blog.pintia.cn/2018/06/27/glibc-segmentation-fault/ > - > http://web.archive.org/web/20210126121421/http://acm.xidian.edu.cn/discuss/thread.php?tid=148&cid=# > (comment 3) > > but people just keep telling me "you are wrong, you don't understand > security". Some of them even complain "seccomp is broken" as well > but > still keep using it. > > </rant> >