On Fri, Jun 12, 2020 at 4:18 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, Jun 12, 2020 at 12:52:28PM +0300, Amir Goldstein wrote: > > On Fri, Jun 12, 2020 at 12:26 PM Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > The kernel uses internal mounts for a number of purposes including pipes. > > > On every vfs_write regardless of filesystem, fsnotify_modify() is called > > > to notify of any changes which incurs a small amount of overhead in fsnotify > > > even when there are no watchers. > > > > > > A patch is pending that reduces, but does not eliminte, the overhead > > > of fsnotify but for the internal mounts, even the small overhead is > > > unnecessary. The user API is based on the pathname and a dirfd and proc > > > is the only visible path for inodes on an internal mount. Proc does not > > > have the same pathname as the internal entry so even if fatrace is used > > > on /proc, no events trigger for the /proc/X/fd/ files. > > > > > > > This looks like a good direction and I was going to suggest that as well. > > However, I am confused by the use of terminology "internal mount". > > The patch does not do anything dealing with "internal mount". > > I was referring to users of kern_mount. I see. I am not sure if all kern_mount hand out only anonymous inodes, but FYI, now there a MNT_NS_INTERNAL that is not SB_KERNMOUNT: df820f8de4e4 ovl: make private mounts longterm > > > If alloc_file_pseudo() is only called for filesystems mounted as > > internal mounts, > > I believe this is the case and I did not find a counter-example. The > changelog that introduced the helper is not explicit but it was created > in the context of converting a number of internal mounts like pipes, > anon inodes to a common helper. If I'm wrong, Al will likely point it out. > > > please include this analysis in commit message. > > In any case, not every file of internal mount is allocated with > > alloc_file_pseudo(), > > right? > > Correct. It is not required and there is at least one counter example > in arch/ia64/kernel/perfmon.c but I don't think that is particularly > important, I don't think anyone is kept awake at night worrying about > small performance overhead on Itanium. > > > So maybe it would be better to list all users of alloc_file_pseudo() > > and say that they all should be opted out of fsnotify, without mentioning > > "internal mount"? > > > > The users are DMA buffers, CXL, aio, anon inodes, hugetlbfs, anonymous > pipes, shmem and sockets although not all of them necessary end up using > a VFS operation that triggers fsnotify. Either way, I don't think it > makes sense (or even possible) to watch any of those with fanotify so > setting the flag seems reasonable. > I also think this seems reasonable, but the more accurate reason IMO is found in the comment for d_alloc_pseudo(): "allocate a dentry (for lookup-less filesystems)..." > I updated the changelog and maybe this is clearer. I still find the use of "internal mount" terminology too vague. "lookup-less filesystems" would have been more accurate, because as you correctly point out, the user API to set a watch requires that the marked object is looked up in the filesystem. There are also some kernel internal users that set watches like audit and nfsd, but I think they are also only interested in inodes that have a path at the time that the mark is setup. Thanks, Amir.