On Mon, Jun 15, 2020 at 11:12 AM Jan Kara <jack@xxxxxxx> wrote: > > On Fri 12-06-20 23:34:16, Amir Goldstein wrote: > > > > > So maybe it would be better to list all users of alloc_file_pseudo() > > > > > and say that they all should be opted out of fsnotify, without mentioning > > > > > "internal mount"? > > > > > > > > > > > > > The users are DMA buffers, CXL, aio, anon inodes, hugetlbfs, anonymous > > > > pipes, shmem and sockets although not all of them necessary end up using > > > > a VFS operation that triggers fsnotify. Either way, I don't think it > > > > makes sense (or even possible) to watch any of those with fanotify so > > > > setting the flag seems reasonable. > > > > > > > > > > I also think this seems reasonable, but the more accurate reason IMO > > > is found in the comment for d_alloc_pseudo(): > > > "allocate a dentry (for lookup-less filesystems)..." > > > > > > > I updated the changelog and maybe this is clearer. > > > > > > I still find the use of "internal mount" terminology too vague. > > > "lookup-less filesystems" would have been more accurate, > > > > Only it is not really accurate for shmfs anf hugetlbfs, which are > > not lookup-less, they just hand out un-lookable inodes. > > OK, but I still think we are safe setting FMODE_NONOTIFY in > alloc_file_pseudo() and that covers all the cases we care about. Or did I > misunderstand something in the discussion? I can see e.g. > __shmem_file_setup() uses alloc_file_pseudo() but again that seems to be > used only for inodes without a path and the comment before d_alloc_pseudo() > pretty clearly states this should be the case. > > So is the dispute here really only about how to call files using > d_alloc_pseudo()? > Yes, semantics, no technical dispute on the patch. > > > because as you correctly point out, the user API to set a watch > > > requires that the marked object is looked up in the filesystem. > > > > > > There are also some kernel internal users that set watches > > > like audit and nfsd, but I think they are also only interested in > > > inodes that have a path at the time that the mark is setup. > > > > > > > FWIW I verified that watches can be set on anonymous pipes > > via /proc/XX/fd, so if we are going to apply this patch, I think it > > should be accompanied with a complimentary patch that forbids > > setting up a mark on these sort of inodes. If someone out there > > is doing this, at least they would get a loud message that something > > has changed instead of silently dropping fsnotify events. > > > > So now the question is how do we identify/classify "these sort of > > inodes"? If they are no common well defining characteristics, we > > may need to blacklist pipes sockets and anon inodes explicitly > > with S_NONOTIFY. > > We already do have FS_DISALLOW_NOTIFY_PERM in file_system_type->fs_flags so > adding FS_DISALLOW_NOTIFY would be natural if there is a need for this. Yes, it is possible, but for the specified use case, it is probably easier to classify by inode type (and maybe IS_ROOT()) than by filesystem type. Also, in the case of shmem, the same file_system_type is used for user mountable tmpfs and the kernel internal shm_mnt instance - only the latter is used for handing out anonymous shmem files. > > I don't think using fsnotify on pipe inodes is sane in any way. You'd > possibly only get the MODIFY or ACCESS events and even those would not be > quite reliable because with pipes stuff like splicing etc. is much more > common and that currently completely bypasses fsnotify subsystem. So > overall I'm fine with completely ignoring fsnotify on such inodes. > Agreed for MODIFY ACCESS. Not so sure about other events. I see that nfsd filecache backend only marks regular files, so that's fine. I *think* audit only marks directories and exe files, but completely unsure. Maybe there is no need to optimize out special inodes from all events and only exclude them from MODIFY/ACCESS, which are the only events where performance may be a concern? Or maybe you did not mean to skip events on special inodes in general? I am not sure how important OPEN events are on special inodes, but it is scary to stop sending OPEN_PERM events. Do you agree that we should also actively disallow setting a mark on special disconnected inodes? instead of silently dropping events that current kernel does deliver? We could disallow setting a mark on a disconnected inode (one that user is trying to configure by using a /proc/$pid/fd/X path). We can enforce this restriction for all backends in the common helper fsnotify_add_mark_locked(). Thanks, Amir.