> But it's still no better than the patch here in the happy case, since we > still have to perform three fcntl() checks to figure out that all three > descriptors are initialized as-expected (versus just one open() and > close()). An alternative to performing three syscalls fot the check one could call open(2) with O_RDONLY (O_PATH would also work, but seems not yet to be used in the git source) on a common path ("/", "/dev/null", ...) and skip the sanitization if the returned descriptor is greater than 2. This would lead to two (open + close) syscalls in the common case, same as current. > If Christian can tighten > the environment into somewhat unnatural "opening writable FD is a > failure" way, I suspect such a jail can be augmented to further to > allow opening /dev/null and other "selected" files writable, so I > wouldn't worry too much if we dropped this patch entirely. The seccomp filter only gets the address of the memory where the path is stored, so simple allow-listing paths is not possible. And even on inspection of the path one would need to avoid toctou attacks (the filter seeing a different memory content at check time than the kernel at use time).