On Wed, Jan 15, 2025 at 12:27 PM Shyam Prasad N <nspmangalore@xxxxxxxxx> wrote: > > On Tue, Jan 14, 2025 at 6:55 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > On Tue, Jan 14, 2025 at 4:38 AM Shyam Prasad N <nspmangalore@xxxxxxxxx> wrote: > > > > > > The Linux kernel does buffered reads and writes using the page cache > > > layer, where the filesystem reads and writes are offloaded to the > > > VM/MM layer. The VM layer does a predictive readahead of data by > > > optionally asking the filesystem to read more data asynchronously than > > > what was requested. > > > > > > The VFS layer maintains a dentry cache which gets populated during > > > access of dentries (either during readdir/getdents or during lookup). > > > This dentries within a directory actually forms the address space for > > > the directory, which is read sequentially during getdents. For network > > > filesystems, the dentries are also looked up during revalidate. > > > > > > During sequential getdents, it makes sense to perform a readahead > > > similar to file reads. Even for revalidations and dentry lookups, > > > there can be some heuristics that can be maintained to know if the > > > lookups within the directory are sequential in nature. With this, the > > > dentry cache can be pre-populated for a directory, even before the > > > dentries are accessed, thereby boosting the performance. This could > > > give even more benefits for network filesystems by avoiding costly > > > round trips to the server. > > > > > > > I believe you are referring to READDIRPLUS, which is quite common > > for network protocols and also supported by FUSE. > This discussion is not completely about readdirplus, but definitely is > a part of it. > I'm suggesting doing the next set of readdir() calls in advance, so > that the data needed to serve those are already in the cache. > I'm also suggesting artificially doing a readdir to avoid sequential > revalidation of each dentry; or a readdirplus to avoid stat of each > inode corresponding to these dentries Well, if readdirplus is implemented, then "readaheadplus" could be implemented by async io_uring readdirplus commands. Right? io_uring command would have to know to chain the following readdirplus commands with the offset returned from the previous readdirplus response, but that should be doable I think? > > > > Unlike network protocols, FUSE decides by server configuration and > > heuristics whether to "fuse_use_readdirplus" - specifically in readdirplus_auto > > mode, FUSE starts with readdirplus, but if nothing calls lookup on the > > directory inode by the time the next getdents call, it stops with readdirplus. > > > > I personally ran into the problem that I would like to control from the > > application, which knows if it is doing "ls" or "ls -l" whether a specific > > getdents() will use FUSE readdirplus or not, because in some situations > > where "ls -l" is not needed that can avoid a lot of unneeded IO. > > > > I do not know if implementing readdirplus (i.e. populate inode and dentry) > > makes sense for disk filesystems, but if we do it in VFS level, there has to > > be at an API to control or at least opt-out of readdirplus, like with readahead. > That would be a great knob to have for network filesystems. We have to > rely on heuristics today to predict which of these patterns the > workload is using. > It seems like the demand existed for a long time. Man page for posix_fadvise(2) says: "Programs can use posix_fadvise() to announce an intention to access file data in a specific pattern in the future, thus allowing the kernel to perform appropriate optimizations." I do not read this as limiting to non-directory files, and indeed fadvise() can be called on directories, but others could argue that this is an API abuse. Mind sending a patch for POSIX_FADV_{NO,}READDIRPLUS? make sure it fails with -ENOTDIR on non-dir and be ready to face the inevitable bikeshedding ;) Thanks, Amir.