On Fri, Oct 13, 2023 at 10:56 AM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > On Fri, Oct 13, 2023 at 09:57:50AM +0300, Amir Goldstein wrote: > > On Thu, Oct 12, 2023 at 6:37 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > > > In commit f61b9bb3f838 ("fs: add a new SB_I_NOUMASK flag") we added a > > > new SB_I_NOUMASK flag that is used by filesystems like NFS to indicate > > > that umask stripping is never supposed to be done in the vfs independent > > > of whether or not POSIX ACLs are supported. > > > > > > Overlayfs falls into the same category as it raises SB_POSIXACL > > > unconditionally to defer umask application to the upper filesystem. > > > > > > Now that we have SB_I_NOUMASK use that and make SB_POSIXACL properly > > > conditional on whether or not the kernel does have support for it. This > > > will enable use to turn IS_POSIXACL() into nop on kernels that don't > > > have POSIX ACL support avoding bugs from missed umask stripping. > > > > > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> > > > --- > > > Hey Amir & Miklos, > > > > > > This depends on the aforementioned patch in vfs.misc. So if you're fine > > > with this change I'd take this through vfs.misc. > > > > Generally, I'm fine with a version of this going through the vfs tree. > > > > > > > > Christian > > > --- > > > fs/overlayfs/super.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > > > index 9f43f0d303ad..361189b676b0 100644 > > > --- a/fs/overlayfs/super.c > > > +++ b/fs/overlayfs/super.c > > > @@ -1489,8 +1489,16 @@ int ovl_fill_super(struct super_block *sb, struct fs_context *fc) > > > sb->s_xattr = ofs->config.userxattr ? ovl_user_xattr_handlers : > > > ovl_trusted_xattr_handlers; > > > sb->s_fs_info = ofs; > > > +#ifdef CONFIG_FS_POSIX_ACL > > > sb->s_flags |= SB_POSIXACL; > > > +#endif > > > > IDGI, if IS_POSIXACL() is going to turn into noop > > why do we need this ifdef? > > Because it's wrong to advertise SB_POSIXACL support if the kernel > doesn't support it at all. It's a minor correctness thing. I'll provide > more details below. > > > > > To me the flag SB_POSIXACL means that any given inode > > MAY have a custom posix acl. > > > > > sb->s_iflags |= SB_I_SKIP_SYNC | SB_I_IMA_UNVERIFIABLE_SIGNATURE; > > > + /* > > > + * Ensure that umask handling is done by the filesystems used > > > + * for the the upper layer instead of overlayfs as that would > > > + * lead to unexpected results. > > > + */ > > > + sb->s_iflags |= SB_I_NOUMASK; > > > > > > > Looks like FUSE has a similar pattern, although the testing and then > > setting of SB_POSIXACL is perplexing to me: > > > > /* Handle umasking inside the fuse code */ > > if (sb->s_flags & SB_POSIXACL) > > fc->dont_mask = 1; > > sb->s_flags |= SB_POSIXACL; > > > > And for NFS, why was SB_I_NOUMASK set for v4 and not for v3 > > when v3 clearly states: > > > > case 3: > > /* > > * The VFS shouldn't apply the umask to mode bits. > > * We will do so ourselves when necessary. > > */ > > sb->s_flags |= SB_POSIXACL; > > > > Feels like I am missing parts of the big picture? > > When a filesystem doesn't raise SB_POSIXACL the vfs will strip the > umask. When a filesystem does raise SB_POSIXACL the vfs will not strip > the umask. Instead, the umask is stripped during posix_acl_create() in > the individual filesystems. > > If the kernel doesn't support POSIX ACLs, i.e. CONFIG_FS_POSIX_ACL isn't > set then posix_acl_create() is a nop. Hence, on a kernel without > CONFIG_FS_POSIX_ACL any filesystem that raises SB_POSIXACL will not do > any umask stripping at all: not on the vfs level because the vfs sees > SB_POSIXACL and not on the filesystem level because posix_acl_create() > is a nop. > > This complication in umask handling is a side-effect of POSIX ACLs and > one that is really nasty. But the general idea is that you can't get one > without the other. IOW, obviously no one should raise SB_POSIXACL if > they don't intend to support POSIX ACLs... > > But as we all know, hackers like us gotta hack. So filesystems like NFS > do use SB_POSIXACL for exactly that. But life without nuanced > complications isn't worth living so NFS v3 and NFS v4 do it slightly > differently. > > NFS v3 uses SB_POSIXACL to avoid umask stripping in the VFS but NFS v3 > does optionally support POSIX ACLs > > #ifdef CONFIG_NFS_V3_ACL > .listxattr = nfs3_listxattr, > .get_inode_acl = nfs3_get_acl, > .set_acl = nfs3_set_acl, > #endif > > and so raising SB_POSIXACL makes sense in that case. But in all > likelihood NFS should conditionalize SB_POSIXACL for NFS v3 on > CONFIG_NFS_V3_ACL or - if they want to prevent umask stripping > completely - they also need to raise SB_I_NOUMASK. But I have zero idea > what's correct for them so I trust Jeff and other people interested in > NFS to figure out what they need. > > NFS v4 on the other hand clearly doesn't care about POSIX ACL support at > all because they don't support it in any form. So NFS v4 is only > interested in the side-effects of POSIX ACLs on umask handling. > > So for them SB_I_NOUMASK is clearly the right thing to do. > > My reasoning about overlayfs is that it falls into the same category as > NFS v4 albeit for slightly different reasons. As a stacking filesystems > with a writable upper layer overlayfs can never rely on umask handling > done in the VFS because it doesn't (easily) know up front whether the > underlying filesystems supports POSIX ACLs or not. > > And currently it always has to raise SB_POSIXACL to get the side-effects > on umask handling even if the kernel isn't compiled with POSIX ACL > support at all. IOW, you always want the upper layer to do the umask > handling. > > So the correct thing to do is to raise SB_I_NOUMASK to communicate > clearly that umask handling will always be done by the upper layer. And > that in turn allows you to stop raising SB_POSIXACL unconditionally. > > Because in fs/overlayfs/{inode.c,overlayfs.h} you can also see that all > ovl inode operations are nop-ed when CONFIG_FS_POSIX_ACL is turned off. > All right. Acked-by: Amir Goldstein <amir73il@xxxxxxxxx> Thanks for the explanation, Amir.