On Thu, Oct 6, 2022 at 5:25 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > On October 6, 2022 7:13:37 AM PDT, Jann Horn <jannh@xxxxxxxxxx> wrote: > >On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <brauner@xxxxxxxxxx> wrote: > >> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote: > >> > The check_unsafe_exec() counting of n_fs would not add up under a heavily > >> > threaded process trying to perform a suid exec, causing the suid portion > >> > to fail. This counting error appears to be unneeded, but to catch any > >> > possible conditions, explicitly unshare fs_struct on exec, if it ends up > >> > >> Isn't this a potential uapi break? Afaict, before this change a call to > >> clone{3}(CLONE_FS) followed by an exec in the child would have the > >> parent and child share fs information. So if the child e.g., changes the > >> working directory post exec it would also affect the parent. But after > >> this change here this would no longer be true. So a child changing a > >> workding directoro would not affect the parent anymore. IOW, an exec is > >> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but > >> it seems like a non-trivial uapi change but there might be few users > >> that do clone{3}(CLONE_FS) followed by an exec. > > > >I believe the following code in Chromium explicitly relies on this > >behavior, but I'm not sure whether this code is in active use anymore: > > > >https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium > > Oh yes. I think I had tried to forget this existed. Ugh. Okay, so back to the drawing board, I guess. The counting will need to be fixed... > > It's possible we can move the counting after dethread -- it seems the early count was just to avoid setting flags after the point of no return, but it's not an error condition... Random idea that I haven't thought about a lot: One approach might be to not do it by counting, but instead have a flag on the fs_struct that we set when someone does a clone() with CLONE_FS but without CLONE_THREAD? Then we'd end up with the following possible states for fs_struct: - single-process, normal - single-process, pending execve past check_unsafe_exec() (prevent concurrent CLONE_FS) - shared between processes The slight difference from the old semantics would be that once you've used CLONE_FS without CLONE_THREAD, you can never do setuid execve() from your current process again (without calling unshare()), even if the child disappears in the meantime. I think that might be an acceptably small UAPI break.