On 4/22/23 04:13, Jens Axboe wrote: > On 4/21/23 4:09?PM, Bernd Schubert wrote: >> Hello, >> >> I was wondering if I could set up SQPOLL for fuse/IORING_OP_URING_CMD >> and what would be the latency win. Now I get a bit confused what the >> f_op->uring_cmd_iopoll() function is supposed to do. > > Certainly, you can use SQPOLL with anything. Whether or not it'd be a > win depends a lot on what you're doing, rate of IO, etc. > > IOPOLL and SQPOLL are two different things. SQPOLL has a kernel side > submission thread that polls for new SQ entries and submits them when it > sees them. IOPOLL is a method for avoiding sleeping on waiting on CQ > entries, where it will instead poll the target for completion instead. > That's where ->uring_cmd_iopoll() comes in, that's the hook for polling > for uring commands. For normal fs path read/write requests, > ->uring_iopoll() is the hook that performs the same kind of action. > >> Is it just there to check if SQEs are can be completed as CQE? In rw.c > > Not sure I follow what you're trying to convey here, maybe you can > expand on that? And maybe some of the confusion here is because of > mixing up SQPOLL and IOPOLL? Thanks a lot for your help! I was confused when f_op->uring_cmd_iopoll gets called - if I needed to check myself if the SQE was submitted or if this for CQE submission. You already resolved my confusion with your comments below. Thanks a lot for your help! > >> io_do_iopoll() it looks like this. I don't follow all code paths in >> __io_sq_thread yet, but it looks a like it already checks if the ring >> has new entries >> >> to_submit = io_sqring_entries(ctx); >> ... >> ret = io_submit_sqes(ctx, to_submit); >> >> --> it will eventually call into ->uring_cmd() ? > > The SQPOLL thread will pull off new SQEs, and those will then at some > point hit ->issue() which is an opcode dependent method for issuing the > actual request. Once it's been issued, if the ring is IOPOLL, then > io_iopoll_req_issued() will get called which adds the request to an > internal poll list. When someone does io_uring_enter(2) to wait for > events on a ring with IOPOLL, it will iterate that list and call > ->uring_cmd_iopoll() for uring_cmd requests, and ->uring_iopoll() for > "normal" requests. Thanks, that basically answered my SQE confusion - it does all itself. > > If the ring is using SQPOLL|IOPOLL, then the SQPOLL thread is also the > one that does the polling. See __io_sq_thread() -> io_do_iopoll(). > >> And then io_do_iopoll -> file->f_op->uring_cmd_iopoll is supposed to >> check for available cq entries and will submit these? I.e. I just return >> 1 if when the request is ready? And also ensure that >> req->iopoll_completed is set? > > The callback polls for a completion on the target side, which will mark > is as ->iopoll_completed = true. That still leaves them on the iopoll > list, and io_do_iopoll() will spot that and post CQEs for them. > >> I'm also not sure what I should do with struct io_comp_batch * - I don't >> have struct request *req_list anywhere in my fuse-uring changes, seems >> to be blk-mq specific? So I should just ignore that parameter? > > Hard to say since the above is a bit confusing and I haven't seen your > code, but you can always start off just passing NULL. That's fine and > just doesn't do any completion batching. The latter may or may not be > useful for your case, but in any case, it's fine to pass NULL. I had send patches to fsdevel and given it is mostly fuse related, didn't add you to CC https://lwn.net/Articles/926773/ The code is also here https://github.com/bsbernd/linux/tree/fuse-uring-for-6.2 https://github.com/bsbernd/libfuse/tree/uring I got SQPOLL working using this simple function /** * This is called for requests when the ring is configured with * IORING_SETUP_IOPOLL. */ int fuse_uring_cmd_poll(struct io_uring_cmd *cmd, struct io_comp_batch *iob, unsigned int poll_flags) { /* Not much to be done here, when IORING_SETUP_IOPOLL is set * io_uring_cmd_done() already sets req->iopoll_completed. * The caller (io_do_iopoll) already checks this flag * and won't enter this function at all then. * When we get called we just need to return 0 and tell the * caller that the cmd is not ready yet. */ return 0; } Just gave it a quick file creation/removal run with single threaded bonnie++ and performance is actually lower than before (around 8000 creates/s without IORING_SETUP_SQPOLL (adding IORING_SETUP_IOPOLL doesn't help either) and about 5000 creates/s with IORING_SETUP_SQPOLL). With plain /dev/fuse it is about 2000 creates/s. Main improvement comes from ensuring request submission (application) and request handling (ring/thread) are on the same core. I'm running into some scheduler issues, which I work around for now using migrate_disable()/migrate_enable() in before/after fuse request waitq, without that performance for metadata requests is similar to plain /dev/fuse. I will soon post an RFC v2 series with more benchmark results including read and write. > >> Btw, this might be useful for ublk as well? > > Not sure what "this" is :-) Ming already replied. Thanks, Bernd