Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 26, 2023 at 11:25:15AM +0800, Ming Lei wrote:
> On Tue, Apr 25, 2023 at 04:46:03PM +0100, Pavel Begunkov wrote:
> > On 4/25/23 16:25, Jens Axboe wrote:
> > > On 4/25/23 9:07?AM, Ming Lei wrote:
> > > > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote:
> > > > > On 4/25/23 8:42?AM, Ming Lei wrote:
> > > > > > On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
> > > > > > > On 4/24/23 8:50?PM, Ming Lei wrote:
> > > > > > > > On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
> > > > > > > > > On 4/24/23 8:13?PM, Ming Lei wrote:
> > > > > > > > > > On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
> > > > > > > > > > > On 4/24/23 6:57?PM, Ming Lei wrote:
> > > > > > > > > > > > On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
> > > > > > > > > > > > > On 4/24/23 1:30?AM, Ming Lei wrote:
> > > > > > > > > > > > > > On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
> > > > > > > > > > > > > > > Add an opdef bit for them, and set it for the opcodes where we always
> > > > > > > > > > > > > > > need io-wq punt. With that done, exclude them from the file_can_poll()
> > > > > > > > > > > > > > > check in terms of whether or not we need to punt them if any of the
> > > > > > > > > > > > > > > NO_OFFLOAD flags are set.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >   io_uring/io_uring.c |  2 +-
> > > > > > > > > > > > > > >   io_uring/opdef.c    | 22 ++++++++++++++++++++--
> > > > > > > > > > > > > > >   io_uring/opdef.h    |  2 ++
> > > > > > > > > > > > > > >   3 files changed, 23 insertions(+), 3 deletions(-)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> > > > > > > > > > > > > > > index fee3e461e149..420cfd35ebc6 100644
> > > > > > > > > > > > > > > --- a/io_uring/io_uring.c
> > > > > > > > > > > > > > > +++ b/io_uring/io_uring.c
> > > > > > > > > > > > > > > @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
> > > > > > > > > > > > > > >   		return -EBADF;
> > > > > > > > > > > > > > >   	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
> > > > > > > > > > > > > > > -	    (!req->file || !file_can_poll(req->file)))
> > > > > > > > > > > > > > > +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
> > > > > > > > > > > > > > >   		issue_flags &= ~IO_URING_F_NONBLOCK;
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I guess the check should be !def->always_iowq?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > How so? Nobody that takes pollable files should/is setting
> > > > > > > > > > > > > ->always_iowq. If we can poll the file, we should not force inline
> > > > > > > > > > > > > submission. Basically the ones setting ->always_iowq always do -EAGAIN
> > > > > > > > > > > > > returns if nonblock == true.
> > > > > > > > > > > > 
> > > > > > > > > > > > I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
> > > > > > > > > > > > these OPs won't return -EAGAIN, then run in the current task context
> > > > > > > > > > > > directly.
> > > > > > > > > > > 
> > > > > > > > > > > Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
> > > > > > > > > > > it :-)
> > > > > > > > > > 
> > > > > > > > > > But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
> > > > > > > > > > not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
> > > > > > > > > > ->always_iowq is a bit confusing?
> > > > > > > > > 
> > > > > > > > > Yeah naming isn't that great, I can see how that's bit confusing. I'll
> > > > > > > > > be happy to take suggestions on what would make it clearer.
> > > > > > > > 
> > > > > > > > Except for the naming, I am also wondering why these ->always_iowq OPs
> > > > > > > > aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
> > > > > > > > shouldn't improve performance by doing so because these OPs are supposed
> > > > > > > > to be slow and always slept, not like others(buffered writes, ...),
> > > > > > > > can you provide one hint about not offloading these OPs? Or is it just that
> > > > > > > > NO_OFFLOAD needs to not offload every OPs?
> > > > > > > 
> > > > > > > The whole point of NO_OFFLOAD is that items that would normally be
> > > > > > > passed to io-wq are just run inline. This provides a way to reap the
> > > > > > > benefits of batched submissions and syscall reductions. Some opcodes
> > > > > > > will just never be async, and io-wq offloads are not very fast. Some of
> > > > > > 
> > > > > > Yeah, seems io-wq is much slower than inline issue, maybe it needs
> > > > > > to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
> > > > > 
> > > > > Indeed, depending on what is being linked, you may see io-wq activity
> > > > > which is not ideal.
> > > > 
> > > > That is why I prefer to fused command for ublk zero copy, because the
> > > > registering buffer approach suggested by Pavel and Ziyang has to link
> > > > register buffer OP with the actual IO OP, and it is observed that
> > > > IOPS drops to 1/2 in 4k random io test with registered buffer approach.
> > > 
> > > It'd be worth looking into if we can avoid io-wq for link execution, as
> > > that'd be a nice win overall too. IIRC, there's no reason why it can't
> > > be done like initial issue rather than just a lazy punt to io-wq.
> > 
> > I might've missed a part of the discussion, but links are _usually_
> > executed by task_work, e.g.
> > 
> > io_submit_flush_completions() -> io_queue_next() -> io_req_task_queue()
> 
> Good catch, just figured out that /dev/ublkcN & backing file isn't opened by
> O_NONBLOCK.
> 
> But -EAGAIN is still returned from io_write() even though the regular
> file is opened with O_DIRECT, at least on btrfs & xfs, so io wq is still
> scheduled. Not look into the exact reason yet, and not see such issue for
> block device. Anyway, it isn't related with io wq.

It is because -EAGAIN is returned from call_write_iter() in case of IOCB_NOWAIT,
so it is exactly what Jens's patchset is addressing.


Thanks,
Ming




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux