Hi, Had some time to re-do some testing. 1) Pipewire (its wireplumber deamon) set a watch on the children of the directory /dev via inotify. I removed that (disabled pipewire), but still had the fsnotify overhead when using aio/io_ring at high IOPS across several threads on several cores. 2) I then noticed that udev set a watch (via inotify) on the files in /dev. This is due to a rule in /usr/lib/udev/rules.d/60-block.rules # watch metadata changes, caused by tools closing the device node which was opened for writing ACTION!="remove", SUBSYSTEM=="block", \ KERNEL=="loop*|mmcblk*[0-9]|msblk*[0-9]|mspblk*[0-9]|nvme*|sd*|vd*|xvd*|bcache*|cciss*|dasd*|ubd*|ubi*|scm*|pmem*|nbd*|zd*", \ OPTIONS+="watch" I removed "nvme*" from this rule (I am testing on /dev/nvme0n1), then finally the fsnotify overhead disappeared. 3) I think there is nothing wrong with Pipewire and udev, they simply want to watch what is going on in /dev. I don't think they are interested in (and it is not the goal/charter of fsnotify) quantifying millions of read/write accesses/sec to a file they watch. There are other tools for that, that are optimized for that task. I think to avoid the overhead, the fsnotify subsystem should be refined to factor high frequency read/write file access. Or piece of code (like aio/io_uring) doing high frequency fsnotify should do the factoring themselves. Or the user should be given a way to turn off fsnotify calls for read/write on specific file. Now, the only way to work around the cpu overhead without hacking, is to disable services watching /dev. That means people can't use these services anymore. Doesn't seem right. Regards, Pierre > -----Original Message----- > From: Pierre Labat > Sent: Monday, August 14, 2023 9:31 AM > To: Jeff Moyer <jmoyer@xxxxxxxxxx> > Cc: Jens Axboe <axboe@xxxxxxxxx>; 'io-uring@xxxxxxxxxxxxxxx' <io- > uring@xxxxxxxxxxxxxxx> > Subject: RE: [EXT] Re: FYI, fsnotify contention with aio and io_uring. > > Hi Jeff, > > Indeed, by default, in my configuration, pipewire is running. > When I can re-test, I'll disabled it and see if that remove the problem. > Thanks for the hint! > > Pierre > > > -----Original Message----- > > From: Jeff Moyer <jmoyer@xxxxxxxxxx> > > Sent: Wednesday, August 9, 2023 10:15 AM > > To: Pierre Labat <plabat@xxxxxxxxxx> > > Cc: Jens Axboe <axboe@xxxxxxxxx>; 'io-uring@xxxxxxxxxxxxxxx' <io- > > uring@xxxxxxxxxxxxxxx> > > Subject: Re: [EXT] Re: FYI, fsnotify contention with aio and io_uring. > > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless > > you recognize the sender and were expecting this message. > > > > > > Pierre Labat <plabat@xxxxxxxxxx> writes: > > > > > Micron Confidential > > > > > > Hi Jeff and Jens, > > > > > > About "FAN_MODIFY fsnotify watch set on /dev". > > > > > > Was using Fedora34 distro (with 6.3.9 kernel), and fio. Without any > > particular/specific setting. > > > I tried to see what could watch /dev but failed at that. > > > I used the inotify-info tool, but that display watchers using the > > > inotify interface. And nothing was watching /dev via inotify. > > > Need to figure out how to do the same but for the fanotify interface. > > > I'll look at it again and let you know. > > > > You wouldn't happen to be running pipewire, would you? > > > > https://urldefense.com/v3/__https://gitlab.freedesktop.org/pipewire/pi > > pewir > > e/- > > /commit/88f0dbd6fcd0a412fc4bece22afdc3ba0151e4cf__;!!KZTdOCjhgt4hgw!6E > > 063jj > > -_XK1NceWzms7DaYacILy4cKmeNVA3xalNwkd0zrYTX-IouUnvJ8bZs-RG3YSdk5XpFoo$ > > > > -Jeff > > > > > > > > Regards, > > > > > > Pierre > > > > > > > > > > > > Micron Confidential > > >> -----Original Message----- > > >> From: Jens Axboe <axboe@xxxxxxxxx> > > >> Sent: Tuesday, August 8, 2023 2:41 PM > > >> To: Jeff Moyer <jmoyer@xxxxxxxxxx>; Pierre Labat > > >> <plabat@xxxxxxxxxx> > > >> Cc: 'io-uring@xxxxxxxxxxxxxxx' <io-uring@xxxxxxxxxxxxxxx> > > >> Subject: [EXT] Re: FYI, fsnotify contention with aio and io_uring. > > >> > > >> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments > > >> unless you recognize the sender and were expecting this message. > > >> > > >> > > >> On 8/7/23 2:11?PM, Jeff Moyer wrote: > > >> > Hi, Pierre, > > >> > > > >> > Pierre Labat <plabat@xxxxxxxxxx> writes: > > >> > > > >> >> Hi, > > >> >> > > >> >> This is FYI, may be you already knows about that, but in case > > >> >> you > > >> don't.... > > >> >> > > >> >> I was pushing the limit of the number of nvme read IOPS, the FIO > > >> >> + the Linux OS can handle. For that, I have something special > > >> >> under the Linux nvme driver. As a consequence I am not limited > > >> >> by whatever the NVME SSD max IOPS or IO latency would be. > > >> >> > > >> >> As I cranked the number of system cores and FIO jobs doing > > >> >> direct 4k random read on /dev/nvme0n1, I hit a wall. The IOPS > > >> >> scaling slows (less than linear) and around 15 FIO jobs on 15 > > >> >> core threads, the overall IOPS, in fact, goes down as I add more > > >> >> FIO jobs. For example on a system with 24 cores/48 threads, when > > >> >> I goes beyond 15 FIO jobs, the overall IOPS starts to go down. > > >> >> > > >> >> This happens the same for io_uring and aio. Was using kernel > > >> >> version > > >> 6.3.9. Using one namespace (/dev/nvme0n1). > > >> > > > >> > [snip] > > >> > > > >> >> As you can see 76% of the cpu on the box is sucked up by > > >> >> lockref_get_not_zero() and lockref_put_return(). Looking at the > > >> >> code, there is contention when IO_uring call fsnotify_access(). > > >> > > > >> > Is there a FAN_MODIFY fsnotify watch set on /dev? If so, it > > >> > might be a good idea to find out what set it and why. > > >> > > >> This would be my guess too, some distros do seem to do that. The > > >> notification bits scale horribly, nobody should use it for anything > > >> high performance... > > >> > > >> -- > > >> Jens Axboe