Re: io_uring force_nonblock vs POSIX_FADV_WILLNEED

Jens Axboe <axboe@xxxxxxxxx> · Sat, 1 Feb 2020 09:22:45 -0700

On 2/1/20 2:43 AM, Andres Freund wrote:
> Hi
> 
> Currently io_uring executes fadvise in submission context except for
> DONTNEED:
> 
> static int io_fadvise(struct io_kiocb *req, struct io_kiocb **nxt,
> 		      bool force_nonblock)
> {
> ...
> 	/* DONTNEED may block, others _should_ not */
> 	if (fa->advice == POSIX_FADV_DONTNEED && force_nonblock)
> 		return -EAGAIN;
> 
> which makes sense for POSIX_FADV_{NORMAL, RANDOM, WILLNEED}, but doesn't
> seem like it's true for POSIX_FADV_WILLNEED?
> 
> As far as I can tell POSIX_FADV_WILLNEED synchronously starts readahead,
> including page allocation etc, which of course might trigger quite
> blocking. The fs also quite possibly needs to read metadata.
> 
> 
> Seems like either WILLNEED would have to always be deferred, or
> force_page_cache_readahead, __do_page_cache_readahead would etc need to
> be wired up to know not to block. Including returning EAGAIN, despite
> force_page_cache_readahead and generic_readahead() intentially ignoring
> return values / errors.
> 
> I guess it's also possible to just add a separate precheck that looks
> whether there's any IO needing to be done for the range. That could
> potentially also be used to make DONTNEED nonblocking in case everything
> is clean already, which seems like it could be nice. But that seems
> weird modularity wise.

Good point, we can block on the read-ahead. Which is counter intuitive,
but true.

I'll queue up the below for now, better safe than sorry.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index fb5c5b3e23f4..1464e4c9b04c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2728,8 +2728,7 @@ static int io_fadvise(struct io_kiocb *req, struct io_kiocb **nxt,
 	struct io_fadvise *fa = &req->fadvise;
 	int ret;
 
-	/* DONTNEED may block, others _should_ not */
-	if (fa->advice == POSIX_FADV_DONTNEED && force_nonblock)
+	if (force_nonblock)
 		return -EAGAIN;
 
 	ret = vfs_fadvise(req->file, fa->offset, fa->len, fa->advice);

-- 
Jens Axboe