On Tue, Apr 26, 2022 at 01:52:11PM +0200, Miklos Szeredi wrote: > On Tue, 26 Apr 2022 at 13:21, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > On Tue, Apr 26, 2022 at 1:38 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > One thing that I just remembered and which I think I haven't mentioned > > > so far is that moving S_ISGID stripping from filesystem callpaths into > > > the vfs callpaths means that we're hoisting this logic out of vfs_*() > > > helpers implicitly. > > > > > > So filesystems that call vfs_*() helpers directly can't rely on S_ISGID > > > stripping being done in vfs_*() helpers anymore unless they pass the > > > mode on from a prior run through the vfs. > > > > > > This mostly affects overlayfs which calls vfs_*() functions directly. So > > > a typical overlayfs callstack would be (roughly - I'm omw to lunch): > > > > > > sys_mknod() > > > -> do_mknodat(mode) // calls vfs_prepare_mode() > > > -> .mknod = ovl_mknod(mode) > > > -> ovl_create(mode) > > > -> vfs_mknod(mode) > > > > > > I think we are safe as overlayfs passes on the mode on from its own run > > > through the vfs and then via vfs_*() to the underlying filesystem but it > > > is worth point that out. > > > > > > Ccing Amir just for confirmation. > > > > Looks fine to me, but CC Miklos ... > > Looks fine to me as well. Overlayfs should share the mode (including > the suid and sgid bits), owner, group and ACL's with the underlying > filesystem, so clearing sgid based on overlay parent directory should > result in the same mode as if it was done based on the parent > directory on the underlying layer. Ah yes, good point. > > AFAIU this logic is not affected by userns or mnt_userns, but > Christian would be best to confirm that. It does depend on it as S_ISGID stripping requires knowledge about whether the caller has CAP_FSETID and is capable over the parent directory or if they are in the group the file is owned by. I think ultimately it might just come down to moving vfs_prepare_mode() into vfs_*() helpers and not into the do_*at() helpers. That would be cleaner anyway as right now we have this weird disconnect between vfs_tmpfile() and vfs_{create,mknod,mkdir}(). IOW, vfs_tmpfile() doesn't even have an associated do_*() wrapper where we could call vfs_prepare_mode() from. So ultimately it might be nicer if we do it in vfs_*() helpers anyway. The less pretty thing about it will be that the security_path_*() hooks also want a mode. Right now these hooks receive the mode as it's passed in from userspace minus umask but before S_ISGID stripping happens. Whereas I think they should really see what the filesystem sees and currently it's a bug that they see something else. I need to think about this a bit.