On Tue, Apr 25, 2023 at 09:25:35AM -0600, Jens Axboe wrote: > On 4/25/23 9:07?AM, Ming Lei wrote: > > On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote: > >> On 4/25/23 8:42?AM, Ming Lei wrote: > >>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote: > >>>> On 4/24/23 8:50?PM, Ming Lei wrote: > >>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote: > >>>>>> On 4/24/23 8:13?PM, Ming Lei wrote: > >>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote: > >>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote: > >>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote: > >>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote: > >>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote: > >>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always > >>>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll() > >>>>>>>>>>>> check in terms of whether or not we need to punt them if any of the > >>>>>>>>>>>> NO_OFFLOAD flags are set. > >>>>>>>>>>>> > >>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > >>>>>>>>>>>> --- > >>>>>>>>>>>> io_uring/io_uring.c | 2 +- > >>>>>>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++-- > >>>>>>>>>>>> io_uring/opdef.h | 2 ++ > >>>>>>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-) > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > >>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644 > >>>>>>>>>>>> --- a/io_uring/io_uring.c > >>>>>>>>>>>> +++ b/io_uring/io_uring.c > >>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) > >>>>>>>>>>>> return -EBADF; > >>>>>>>>>>>> > >>>>>>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD && > >>>>>>>>>>>> - (!req->file || !file_can_poll(req->file))) > >>>>>>>>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq)) > >>>>>>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK; > >>>>>>>>>>> > >>>>>>>>>>> I guess the check should be !def->always_iowq? > >>>>>>>>>> > >>>>>>>>>> How so? Nobody that takes pollable files should/is setting > >>>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline > >>>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN > >>>>>>>>>> returns if nonblock == true. > >>>>>>>>> > >>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and > >>>>>>>>> these OPs won't return -EAGAIN, then run in the current task context > >>>>>>>>> directly. > >>>>>>>> > >>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of > >>>>>>>> it :-) > >>>>>>> > >>>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are > >>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of > >>>>>>> ->always_iowq is a bit confusing? > >>>>>> > >>>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll > >>>>>> be happy to take suggestions on what would make it clearer. > >>>>> > >>>>> Except for the naming, I am also wondering why these ->always_iowq OPs > >>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it > >>>>> shouldn't improve performance by doing so because these OPs are supposed > >>>>> to be slow and always slept, not like others(buffered writes, ...), > >>>>> can you provide one hint about not offloading these OPs? Or is it just that > >>>>> NO_OFFLOAD needs to not offload every OPs? > >>>> > >>>> The whole point of NO_OFFLOAD is that items that would normally be > >>>> passed to io-wq are just run inline. This provides a way to reap the > >>>> benefits of batched submissions and syscall reductions. Some opcodes > >>>> will just never be async, and io-wq offloads are not very fast. Some of > >>> > >>> Yeah, seems io-wq is much slower than inline issue, maybe it needs > >>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK. > >> > >> Indeed, depending on what is being linked, you may see io-wq activity > >> which is not ideal. > > > > That is why I prefer to fused command for ublk zero copy, because the > > registering buffer approach suggested by Pavel and Ziyang has to link > > register buffer OP with the actual IO OP, and it is observed that > > IOPS drops to 1/2 in 4k random io test with registered buffer approach. > > It'd be worth looking into if we can avoid io-wq for link execution, as > that'd be a nice win overall too. IIRC, there's no reason why it can't > be done like initial issue rather than just a lazy punt to io-wq. > > That's not really related to fused command support or otherwise for > that, it'd just be a generic improvement. But it may indeed make the > linekd approach viable for that too. Performance degrade with io-wq is just one reason, another thing is that the current link model doesn't support 1:N dependency, such as, if one buffer is registered, the following N OPs depend the registered the buffer, but actually all the N OPs(requests) have to be run one by one, that is one limit of current io_uring link model. Fused command actually performs pretty good because: 1) no io-wq is involved 2) allow the following N OPs to be issued concurrently 3) avoid to register the buffer to per-context data(we can ignore this part so far, cause the uring_lock should help us to avoid the contention) > > >>>> them can eventually be migrated to async support, if the underlying > >>>> mechanics support it. > >>>> > >>>> You'll note that none of the ->always_iowq opcodes are pollable. If > >>> > >>> True, then looks the ->always_iowq flag doesn't make a difference here > >>> because your patch clears IO_URING_F_NONBLOCK for !file_can_poll(req->file). > > Actually not sure that's the case, as we have plenty of ops that are not > pollable, yet are perfectly fine for a nonblocking issue. Things like > any read/write on a regular file or block device. But you mentioned "none of the ->always_iowq opcodes are pollable". If that isn't true, it is fine to add ->always_iowq. > > >> Yep, we may be able to just get rid of it. The important bit is really > >> getting rid of the forced setting of REQ_F_FORCE_ASYNC which the > >> previous reverts take care of. So we can probably just drop this one, > >> let me give it a spin. > >> > >>> Also almost all these ->always_iowq OPs are slow and blocked, if they are > >>> issued inline, the submission pipeline will be blocked. > >> > >> That is true, but that's very much the tradeoff you make by using > >> NO_OFFLOAD. I would expect any users of this to have two rings, one for > >> just batched submissions, and one for "normal" usage. Or maybe they only > >> do the batched submissions and one is fine. > > > > I guess that NO_OFFLOAD probably should be used for most of usecase, > > cause it does avoid slow io-wq if io-wq perf won't be improved. > > > > Also there is other issue for two rings, such as sync/communication > > between two rings, and single ring should be the easiest way. > > I think some use cases may indeed just use that and be fine with it, > also because it is probably not uncommon to bundle the issues and hence > not really mix and match for issue. But this is a vastly different use > case than fast IO cases, for storage and networking. Though those will > bypass that anyway as they can do nonblocking issue just fine. > > >>>> NO_OFFLOAD is setup, it's pointless NOT to issue them with NONBLOCK > >>>> cleared, as you'd just get -EAGAIN and then need to call them again with > >>>> NONBLOCK cleared from the same context. > >>> > >>> My point is that these OPs are slow and slept, so inline issue won't > >>> improve performance actually for them, and punting to io-wq couldn't > >>> be worse too. On the other side, inline issue may hurt perf because > >>> submission pipeline is blocked when issuing these OPs. > >> > >> That is definitely not true, it really depends on which ops it is. For a > >> lot of them, they don't generally block, but we have to be prepared for > > > > OK, but fsync/fallocate does block. > > They do, but statx, fadvise, madvise, rename, shutdown, etc (basically > all the rest of them) do not for a lot of cases. OK, but fsync/fallocate is often mixed with normal IOs. Thanks, Ming