Re: SQPOLL / uring_cmd_iopoll

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/22/23 04:13, Jens Axboe wrote:
> On 4/21/23 4:09?PM, Bernd Schubert wrote:
>> Hello,
>>
>> I was wondering if I could set up SQPOLL for fuse/IORING_OP_URING_CMD
>> and what would be the latency win. Now I get a bit confused what the
>> f_op->uring_cmd_iopoll() function is supposed to do.
> 
> Certainly, you can use SQPOLL with anything. Whether or not it'd be a
> win depends a lot on what you're doing, rate of IO, etc.
> 
> IOPOLL and SQPOLL are two different things. SQPOLL has a kernel side
> submission thread that polls for new SQ entries and submits them when it
> sees them. IOPOLL is a method for avoiding sleeping on waiting on CQ
> entries, where it will instead poll the target for completion instead.
> That's where ->uring_cmd_iopoll() comes in, that's the hook for polling
> for uring commands. For normal fs path read/write requests,
> ->uring_iopoll() is the hook that performs the same kind of action.
> 
>> Is it just there to check if SQEs are can be completed as CQE? In rw.c
> 
> Not sure I follow what you're trying to convey here, maybe you can
> expand on that? And maybe some of the confusion here is because of
> mixing up SQPOLL and IOPOLL?

Thanks a lot for your help!

I was confused when f_op->uring_cmd_iopoll gets called - if I needed to
check myself if the SQE was submitted or if this for CQE submission.
You already resolved my confusion with your comments below. Thanks a
lot for your help!

> 
>> io_do_iopoll() it looks like this. I don't follow all code paths in
>> __io_sq_thread yet, but it looks a like it already checks if the ring
>> has new entries
>>
>> to_submit = io_sqring_entries(ctx);
>> ...
>> ret = io_submit_sqes(ctx, to_submit);
>>
>>     --> it will eventually call into ->uring_cmd() ?
> 
> The SQPOLL thread will pull off new SQEs, and those will then at some
> point hit ->issue() which is an opcode dependent method for issuing the
> actual request. Once it's been issued, if the ring is IOPOLL, then
> io_iopoll_req_issued() will get called which adds the request to an
> internal poll list. When someone does io_uring_enter(2) to wait for
> events on a ring with IOPOLL, it will iterate that list and call
> ->uring_cmd_iopoll() for uring_cmd requests, and ->uring_iopoll() for
> "normal" requests.

Thanks, that basically answered my SQE confusion - it does all itself.

> 
> If the ring is using SQPOLL|IOPOLL, then the SQPOLL thread is also the
> one that does the polling. See __io_sq_thread() -> io_do_iopoll().
> 
>> And then io_do_iopoll ->  file->f_op->uring_cmd_iopoll is supposed to
>> check for available cq entries and will submit these? I.e. I just return
>> 1 if when the request is ready? And also ensure that
>> req->iopoll_completed is set?
> 
> The callback polls for a completion on the target side, which will mark
> is as ->iopoll_completed = true. That still leaves them on the iopoll
> list, and io_do_iopoll() will spot that and post CQEs for them.
> 
>> I'm also not sure what I should do with struct io_comp_batch * - I don't
>> have struct request *req_list anywhere in my fuse-uring changes, seems
>> to be blk-mq specific? So I should just ignore that parameter?
> 
> Hard to say since the above is a bit confusing and I haven't seen your
> code, but you can always start off just passing NULL. That's fine and
> just doesn't do any completion batching. The latter may or may not be
> useful for your case, but in any case, it's fine to pass NULL.

I had send patches to fsdevel and given it is mostly fuse related, didn't
add you to CC
https://lwn.net/Articles/926773/

The code is also here

https://github.com/bsbernd/linux/tree/fuse-uring-for-6.2
https://github.com/bsbernd/libfuse/tree/uring


I got SQPOLL working using this simple function

/**
  * This is called for requests when the ring is configured with
  * IORING_SETUP_IOPOLL.
  */
int fuse_uring_cmd_poll(struct io_uring_cmd *cmd, struct io_comp_batch *iob,
			unsigned int poll_flags)
{

	/* Not much to be done here, when IORING_SETUP_IOPOLL is set
	 * io_uring_cmd_done() already sets req->iopoll_completed.
	 * The caller (io_do_iopoll) already checks this flag
	 * and won't enter this function at all then.
	 * When we get called we just need to return 0 and tell the
          * caller that the cmd is not ready yet.
	 */
	return 0;
}


Just gave it a quick file creation/removal run with single threaded bonnie++
and performance is actually lower than before (around 8000 creates/s without
IORING_SETUP_SQPOLL (adding IORING_SETUP_IOPOLL doesn't help either) and
about 5000 creates/s with IORING_SETUP_SQPOLL). With plain /dev/fuse
it is about 2000 creates/s.
Main improvement comes from ensuring request submission
(application) and request handling (ring/thread) are on the same core.
I'm running into some scheduler issues, which I work around for now using
migrate_disable()/migrate_enable() in before/after fuse request waitq,
without that performance for metadata requests is similar to plain
/dev/fuse.

I will soon post an RFC v2 series with more benchmark results including
read and write.


> 
>> Btw, this might be useful for ublk as well?
> 
> Not sure what "this" is :-)

Ming already replied.



Thanks,
Bernd






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux