Re: [LSF/MM/BPF TOPIC] Predictive readahead of dentries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 15, 2025 at 12:27 PM Shyam Prasad N <nspmangalore@xxxxxxxxx> wrote:
>
> On Tue, Jan 14, 2025 at 6:55 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > On Tue, Jan 14, 2025 at 4:38 AM Shyam Prasad N <nspmangalore@xxxxxxxxx> wrote:
> > >
> > > The Linux kernel does buffered reads and writes using the page cache
> > > layer, where the filesystem reads and writes are offloaded to the
> > > VM/MM layer. The VM layer does a predictive readahead of data by
> > > optionally asking the filesystem to read more data asynchronously than
> > > what was requested.
> > >
> > > The VFS layer maintains a dentry cache which gets populated during
> > > access of dentries (either during readdir/getdents or during lookup).
> > > This dentries within a directory actually forms the address space for
> > > the directory, which is read sequentially during getdents. For network
> > > filesystems, the dentries are also looked up during revalidate.
> > >
> > > During sequential getdents, it makes sense to perform a readahead
> > > similar to file reads. Even for revalidations and dentry lookups,
> > > there can be some heuristics that can be maintained to know if the
> > > lookups within the directory are sequential in nature. With this, the
> > > dentry cache can be pre-populated for a directory, even before the
> > > dentries are accessed, thereby boosting the performance. This could
> > > give even more benefits for network filesystems by avoiding costly
> > > round trips to the server.
> > >
> >
> > I believe you are referring to READDIRPLUS, which is quite common
> > for network protocols and also supported by FUSE.
> This discussion is not completely about readdirplus, but definitely is
> a part of it.
> I'm suggesting doing the next set of readdir() calls in advance, so
> that the data needed to serve those are already in the cache.
> I'm also suggesting artificially doing a readdir to avoid sequential
> revalidation of each dentry; or a readdirplus to avoid stat of each
> inode corresponding to these dentries

Well, if readdirplus is implemented, then "readaheadplus" could be
implemented by async io_uring readdirplus commands. Right?
io_uring command would have to know to chain the following
readdirplus commands with the offset returned from the previous
readdirplus response, but that should be doable I think?

> >
> > Unlike network protocols, FUSE decides by server configuration and
> > heuristics whether to "fuse_use_readdirplus" - specifically in readdirplus_auto
> > mode, FUSE starts with readdirplus, but if nothing calls lookup on the
> > directory inode by the time the next getdents call, it stops with readdirplus.
> >
> > I personally ran into the problem that I would like to control from the
> > application, which knows if it is doing "ls" or "ls -l" whether a specific
> > getdents() will use FUSE readdirplus or not, because in some situations
> > where "ls -l" is not needed that can avoid a lot of unneeded IO.
> >
> > I do not know if implementing readdirplus (i.e. populate inode and dentry)
> > makes sense for disk filesystems, but if we do it in VFS level, there has to
> > be at an API to control or at least opt-out of readdirplus, like with readahead.
> That would be a great knob to have for network filesystems. We have to
> rely on heuristics today to predict which of these patterns the
> workload is using.
>

It seems like the demand existed for a long time.
Man page for posix_fadvise(2) says:
"Programs can use posix_fadvise() to announce an intention to access file data
 in a specific pattern in the future, thus allowing the kernel to
perform appropriate
 optimizations."

I do not read this as limiting to non-directory files, and indeed fadvise() can
be called on directories, but others could argue that this is an API abuse.

Mind sending a patch for POSIX_FADV_{NO,}READDIRPLUS?
make sure it fails with -ENOTDIR on non-dir and be ready to face the
inevitable bikeshedding ;)

Thanks,
Amir.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux