Re: [PATCH RFC 2/2] io_uring: add support for getdents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 24 2023 at 08:43:00 +0900, Dominique Martinet quoth thus:

> Dave Chinner wrote on Mon, Apr 24, 2023 at 08:40:45AM +1000:
> > This doesn't actually introduce non-blocking getdents operations, so
> > what's the point? If it just shuffles the getdents call off to a
> > background thread, why bother with io_uring in the first place?
> 
> As said in the cover letter my main motivation really is simplifying the
> userspace application:
>  - style-wise, mixing in plain old getdents(2) or readdir(3) in the
> middle of an io_uring handling loop just feels wrong; but this may just
> be my OCD talking.
>  - in my understanding io_uring has its own thread pool, so even if the
> actual getdents is blocking other IOs can progress (assuming there is
> less blocked getdents than threads), without having to build one's own
> extra thread pool next to the uring handling.
> Looking at io_uring/fs.c the other "metadata related" calls there also
> use the synchronous APIs (renameat, unlinkat, mkdirat, symlinkat and
> linkat all do), so I didn't think of that as a problem in itself.

Having written code to create additional worker threads in addition
to using io_uring as a main loop, I'm glad to see this proposal back
again.  I think the original work was shot down much too hastily based
on the file positioning issues.  Really only two cases of directory
position are useful*, which can be expressed either as an off_t
(0 = rewind, -1 = curpos), or a single bit flag.  As I understand the
code, the rewind case shouldn't be any problem.  From a practical
viewpoint, I don't think non-blocking would see a lot of use, but it
wouldn't hurt.

This also seems like a good place to bring up a point I made with
the last attempt at this code.  You're missing an optimization here.
getdents knows whether it is returning a buffer because the next entry
won't fit versus because there are no more entries.  As it doesn't
return that information, callers must always keep calling it back
until EOF.  This means a completely unnecessary call is made for
every open directory.  In other words, for a directory scan where
the buffers are large enough to not overflow, that literally twice
as many calls are made to getdents as necessary.  As io_uring is
in-kernel, it could use an internal interface to getdents which would
return an EOF indicator along with the (probably non-empty) buffer.
io_uring would then return that flag with the CQE.


(* As an aside, the only place I've ever seen a non-zero lseek on a
directory, is in a very resource limited environment, e.g. too small
open files limit.  In the case of a depth-first directory scan, it
must close directories before completely reading them, and reopen /
lseek to their previous position in order to continue.  This scenario
is certainly not worth bothering with for io_uring.)

> > Filesystems like XFS can easily do non-blocking getdents calls - we
> > just need the NOWAIT plumbing (like we added to the IO path with
> > IOCB_NOWAIT) to tell the filesystem not to block on locks or IO.
> > Indeed, filesystems often have async readahead built into their
> > getdents paths (XFS does), so it seems to me that we really want
> > non-blocking getdents to allow filesystems to take full advantage of
> > doing work without blocking and then shuffling the remainder off to
> > a background thread when it actually needs to wait for IO....
> 
> I believe that can be done without any change of this API, so that'll be
> a very welcome addition when it is ready; I don't think the adding the
> uring op should wait on this if we can agree a simple wrapper API is
> good enough (or come up with a better one if someone has a Good Idea)
> 
> (looking at io_uring/rw.c for comparison, io_getdents() will "just" need
> to be adjusted to issue an async req if IO_URING_F_NONBLOCK is set, and
> the poll/retry logic sorted out)
> 
> Thanks,
> -- 
> Dominique Martinet | Asmadeus



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux