Re: [REGRESSION] Re: [PATCH v8 15/19] mm: don't allow huge faults for files with pre content watches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 3, 2025 at 1:41 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Sun 02-02-25 11:04:02, Christian Brauner wrote:
> > On Sun, Feb 02, 2025 at 08:46:21AM +0100, Amir Goldstein wrote:
> > > On Sun, Feb 2, 2025 at 1:58 AM Linus Torvalds
> > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Sat, 1 Feb 2025 at 06:38, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > > >
> > > > > Ok, but those "device fds" aren't really device fds in the sense that
> > > > > they are character fds. They are regular files afaict from:
> > > > >
> > > > > vfio_device_open_file(struct vfio_device *device)
> > > > >
> > > > > (Well, it's actually worse as anon_inode_getfile() files don't have any
> > > > > mode at all but that's beside the point.)?
> > > > >
> > > > > In any case, I think you're right that such files would (accidently?)
> > > > > qualify for content watches afaict. So at least that should probably get
> > > > > FMODE_NONOTIFY.
> > > >
> > > > Hmm. Can we just make all anon_inodes do that? I don't think you can
> > > > sanely have pre-content watches on anon-inodes, since you can't really
> > > > have access to them to _set_ the content watch from outside anyway..
> > > >
> > > > In fact, maybe do it in alloc_file_pseudo()?
> > > >
> > >
> > > The problem is that we cannot set FMODE_NONOTIFY -
> > > we tried that once but it regressed some workloads watching
> > > write on pipe fd or something.
> >
> > Ok, that might be true. But I would assume that most users of
> > alloc_file_pseudo() or the anonymous inode infrastructure will not care
> > about fanotify events. I would not go for a separate helper. It'd be
> > nice to keep the number of file allocation functions low.
> >
> > I'd rather have the subsystems that want it explicitly opt-in to
> > fanotify watches, i.e., remove FMODE_NONOTIFY. Because right now we have
> > broken fanotify support for e.g., nsfs already. So make the subsystems
> > think about whether they actually want to support it.
>
> Agreed, that would be a saner default.
>
> > I would disqualify all anonymous inodes and see what actually does
> > break. I naively suspect that almost no one uses anonymous inodes +
> > fanotify. I'd be very surprised.
> >
> > I'm currently traveling (see you later btw) but from a very cursory
> > reading I would naively suspect the following:
> >
> > // Suspects for FMODE_NONOTIFY
> > drivers/dma-buf/dma-buf.c:      file = alloc_file_pseudo(inode, dma_buf_mnt, "dmabuf",
> > drivers/misc/cxl/api.c: file = alloc_file_pseudo(inode, cxl_vfs_mount, name,
> > drivers/scsi/cxlflash/ocxl_hw.c:        file = alloc_file_pseudo(inode, ocxlflash_vfs_mount, name,
> > fs/anon_inodes.c:       file = alloc_file_pseudo(inode, anon_inode_mnt, name,
> > fs/hugetlbfs/inode.c:           file = alloc_file_pseudo(inode, mnt, name, O_RDWR,
> > kernel/bpf/token.c:     file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
> > mm/secretmem.c: file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem",
> > block/bdev.c:   bdev_file = alloc_file_pseudo_noaccount(BD_INODE(bdev),
> > drivers/tty/pty.c: static int ptmx_open(struct inode *inode, struct file *filp)
> >
> > // Suspects for ~FMODE_NONOTIFY
> > fs/aio.c:       file = alloc_file_pseudo(inode, aio_mnt, "[aio]",
>
> This is just a helper file for managing aio context so I don't think any
> notification makes sense there (events are not well defined). So I'd say
> FMODE_NONOTIFY here as well.
>
> > fs/pipe.c:      f = alloc_file_pseudo(inode, pipe_mnt, "",
> > mm/shmem.c:             res = alloc_file_pseudo(inode, mnt, name, O_RDWR,
>
> This is actually used for stuff like IPC SEM where notification doesn't
> make sense. It's also used when mmapping /dev/zero but that struct file
> isn't easily accessible to userspace so overall I'd say this should be
> FMODE_NONOTIFY as well.

I think there is another code path that the audit missed for getting these
pseudo files not via alloc_file_pseudo():
ipc/shm.c:      file = alloc_file_clone(base, f_flags,

which does not copy f_mode as far as I can tell.

>
> > // Unsure:
> > fs/nfs/nfs4file.c:      filep = alloc_file_pseudo(r_ino, ss_mnt, read_name, O_RDONLY,
>
> AFAICS this struct file is for copy offload and doesn't leave the kernel.
> Hence FMODE_NONOTIFY should be fine.
>
> > net/socket.c:   file = alloc_file_pseudo(SOCK_INODE(sock), sock_mnt, dname,
>
> In this case I think we need to be careful. It's a similar case as pipes so
> probably we should use ~FMODE_NONOTIFY here from pure caution.
>

I tried this approach with patch:
"fsnotify: disable notification by default for all pseudo files"

But I also added another patch:
"fsnotify: disable pre-content and permission events by default"

So that code paths that we missed such as alloc_file_clone()
will not have pre-content events enabled.

Alex,

Can you please try this branch:

https://github.com/amir73il/linux/commits/fsnotify-fixes/

and verify that it fixes your issue.

The branch contains one prep patch:
"fsnotify: use accessor to set FMODE_NONOTIFY_*"
and two independent Fixes patches.

Assuming that it fixes your issue, can you please test each of the
Fixes patches individually, because every one of them should be fixing
the issue independently and every one of them could break something,
so we may end up reverting it later on.

Thanks,
Amir.





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux