On 8/19/23, Christian Brauner <brauner@xxxxxxxxxx> wrote: > On Fri, Aug 18, 2023 at 09:12:39PM +0200, Mateusz Guzik wrote: >> On Fri, Aug 18, 2023 at 10:33:26AM -0700, Kees Cook wrote: >> > This is a double-check I left in place, since it shouldn't have been >> > reachable: >> > >> > /* >> > * may_open() has already checked for this, so it should be >> > * impossible to trip now. But we need to be extra cautious >> > * and check again at the very end too. >> > */ >> > err = -EACCES; >> > if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode) || >> > path_noexec(&file->f_path))) >> > goto exit; >> > >> >> As I mentioned in my other e-mail, the check is racy -- an unlucky >> enough remounting with noexec should trip over it, and probably a chmod >> too. >> >> However, that's not what triggers the warn in this case. >> >> The ntfs image used here is intentionally corrupted and the inode at >> hand has a mode of 777 (as in type not specified). >> >> Then the type check in may_open(): >> switch (inode->i_mode & S_IFMT) { >> >> fails to match anything. >> >> This debug printk: >> diff --git a/fs/namei.c b/fs/namei.c >> index e56ff39a79bc..05652e8a1069 100644 >> --- a/fs/namei.c >> +++ b/fs/namei.c >> @@ -3259,6 +3259,10 @@ static int may_open(struct mnt_idmap *idmap, const >> struct path *path, >> if ((acc_mode & MAY_EXEC) && path_noexec(path)) >> return -EACCES; >> break; >> + default: >> + /* bogus mode! */ >> + printk(KERN_EMERG "got bogus mode inode!\n"); >> + return -EACCES; >> } >> >> error = inode_permission(idmap, inode, MAY_OPEN | acc_mode); >> >> catches it. >> >> All that said, I think adding a WARN_ONCE here is prudent, but I >> don't know if denying literally all opts is the way to go. >> >> Do other filesystems have provisions to prevent inodes like this from >> getting here? > > Bugs reported against the VFS from ntfs/ntfs3 are to be treated with > extreme caution. Frankly, if it isn't reproducible without a corrupted > ntfs/ntfs3 image it is to be dismissed until further notice. > > In this case it simply seems that ntfs is failing at ensuring that its > own inodes it reads from disk have a well-defined type. > > If ntfs fails to validate that its own inodes it puts into the icache > are correctly initialized then the vfs doesn't need to try and taper > over this. > > If ntfs fails at that, there's no guarantee that it doesn't also fail at > setting the correct i_ops for that inode. At which point we can check > the type in may_open() but we already used bogus i_ops the whole time on > some other inodes. > > We're not here to make up for silly bugs like this. That WARN belongs > into ntfs not the vfs. > Given the triggered WARN_ON it seemed to me this would be the operating procedure, I am happy it is not ;) Per your description and the one provided by Theodore I take it filesystems must not ship botched inodes like this one. While in this case this is a clear-cut bug in ntfs, I would argue the entire ordeal exposes a deficiency in VFS -- it should have a debug-only mechanism which catches cases like this early on. For example there could be a mandatory function to call when the filesystem claims it constructed the inode to assert a bunch on it -- it would not catch all possible problems, but would definitely catch this one (and VFS would have to detect the call was not made). Perhaps I should write a separate e-mail about this, but I'm surprised there is no debug-only (as in not present in production kernels) support for asserting the state. To give one example which makes me itchy see inode destruction. There are few checks in clear_inode, but past that there is almost nothing. Similarly there are quite a few comments how the caller is required to hold a given lock, which should have been converted to lockdep asserts years ago. I'm going to write something up later. -- Mateusz Guzik <mjguzik gmail.com>