On Thu, Nov 30, 2017 at 05:18:33AM -0800, Christoph Hellwig wrote: > On Thu, Nov 30, 2017 at 02:07:19AM +0000, Al Viro wrote: > > Incidentally, grepping for sys_close() shows another piece of fun in > > net/netfilter/xt_bpf.c. Folks, ONCE DESCRIPTOR IS INSTALLED, THAT'S > > IT; THERE'S NO REMOVING IT ON FAILURE EXITS. sys_close() should > > never, ever be used that way. Sigh... > > Would be great do unexport the thing. Except that we also have > binfmt_misc (which looks legit) and autofs4, which on crack decided > that close() isn't a fun syscall, they'd much rather have an ioctl > that does exactly the same.. Yes, since binfmt_misc one is guaranteed that its descriptor table is not shared - all callchains go through do_execveat_common(), where we'd use unshare_files(). autofs one is... not in good taste, but still safe; there the descriptor is preexisting and it's essentially a weird way of spelling close(2). References from syscall tables are, of course, OK. init/*.c uses are done pretty much from userland - they could have been straight syscalls, if not for the lack of klibc in kernel tree. Everything else, though... IMO we need a whack-a-mole list somewhere; "new callers of sys_close() anywhere outside of init/* and syscall tables" definitely should be on it...