On 5/31/24 21:10, Jens Axboe wrote: > On 5/31/24 11:36 AM, Bernd Schubert wrote: >> On 5/31/24 18:24, Jens Axboe wrote: >>> On 5/29/24 12:00 PM, Bernd Schubert wrote: >>>> This is to avoid using async completion tasks >>>> (i.e. context switches) when not needed. >>>> >>>> Cc: io-uring@xxxxxxxxxxxxxxx >>>> Signed-off-by: Bernd Schubert <bschubert@xxxxxxx> >>> >>> This patch is very confusing, even after having pulled the other >>> changes. In general, would be great if the io_uring list was CC'ed on >> >> Hmm, let me try to explain. And yes, I definitely need to add these details >> to the commit message >> >> Without the patch: >> >> <sending a struct fuse_req> >> >> fuse_uring_queue_fuse_req >> fuse_uring_send_to_ring >> io_uring_cmd_complete_in_task >> >> <async task runs> >> io_uring_cmd_done() > > And this is a worthwhile optimization, you always want to complete it > line if at all possible. But none of this logic or code belongs in fuse, > it really should be provided by io_uring helpers. > > I would just drop this patch for now and focus on the core > functionality. Send out a version with that, and then we'll be happy to > help this as performant as it can be. This is where the ask on "how to > reproduce your numbers" comes from - with that, it's usually trivial to > spot areas where things could be improved. And I strongly suspect that > will involve providing you with the right API to use here, and perhaps > refactoring a bit on the fuse side. Making up issue_flags is _really_ > not something a user should do. Great that you agree, I don't like the issue_flag handling in fuse code either. I will also follow your suggestion to drop this patch. > >> 1) (current == queue->server_task) >> fuse_uring_cmd (IORING_OP_URING_CMD) received a completion for a >> previous fuse_req, after completion it fetched the next fuse_req and >> wants to send it - for 'current == queue->server_task' issue flags >> got stored in struct fuse_ring_queue::uring_cmd_issue_flags > > And queue->server_task is the owner of the ring? Then yes that is safe Yeah, it is the thread that submits SQEs - should be the owner of the ring, unless daemon side does something wrong (given that there are several userspace implementation and not a single libfuse only, we need to expect and handle implementation errors, though). >> >> 2) 'else if (current->io_uring)' >> >> (actually documented in the code) >> >> 2.1 This might be through IORING_OP_URING_CMD as well, but then server >> side uses multiple threads to access the same ring - not nice. We only >> store issue_flags into the queue for 'current == queue->server_task', so >> we do not know issue_flags - sending through task is needed. > > What's the path leading to you not having the issue_flags? We get issue flags here, but I want to keep changes to libfuse small and want to avoid changing non uring related function signatures. Which is the the why we store issue_flags for the presumed ring owner thread in the queue data structure, but we don't have it for possible other threads then Example: IORING_OP_URING_CMD fuse_uring_cmd fuse_uring_commit_and_release fuse_uring_req_end_and_get_next --> until here issue_flags passed fuse_request_end -> generic fuse function, issue_flags not passed req->args->end() / fuse_writepage_end fuse_simple_background fuse_request_queue_background fuse_request_queue_background_uring fuse_uring_queue_fuse_req fuse_uring_send_to_ring io_uring_cmd_done I.e. we had issue_flags up to fuse_uring_req_end_and_get_next(), but then call into generic fuse functions and stop passing through issue_flags. For the ring-owner we take issue flags stored by fuse_uring_cmd() into struct fuse_ring_queue, but if daemon side uses multiple threads to access the ring we won't have that. Well, we could allow it and store it into an array or rb-tree, but I don't like that multiple threads access something that is optimized to have a thread per core already. > >> 2.2 This might be an application request through the mount point, through >> the io-uring interface. We do know issue flags either. >> (That one was actually a surprise for me, when xfstests caught it. >> Initially I had a condition to send without the extra task then lockdep >> caught that. > > In general, if you don't know the context (eg you don't have issue_flags > passed in), you should probably assume the only way is to sanely proceed > is to have it processed by the task itself. > >> >> In both cases it has to use a tasks. >> >> >> My question here is if 'current->io_uring' is reliable. > > Yes that will be reliable in the sense that it tells you that the > current task has (at least) one io_uring context setup. But it doesn't > tell you anything beyond that, like if it's the owner of this request. Yeah, you can see that it just checks for current->io_uring and then uses a task. > >> 3) everything else >> >> 3.1) For async requests, interesting are cached reads and writes here. At a minimum >> writes a holding a spin lock and that lock conflicts with the mutex io-uring is taking - >> we need a task as well >> >> 3.2) sync - no lock being hold, it can send without the extra task. > > As mentioned, let's drop this patch 19 for now. Send out what you have > with instructions on how to test it, and I'll give it a spin and see > what we can do about this. > >>> Outside of that, would be super useful to include a blurb on how you set >>> things up for testing, and how you run the testing. That would really >>> help in terms of being able to run and test it, and also to propose >>> changes that might make a big difference. >>> >> >> Will do in the next version. >> You basically need my libfuse uring branch >> (right now commit history is not cleaned up) and follow >> instructions in <libfuse>/xfstests/README.md how to run xfstests. >> Missing is a slight patch for that dir to set extra daemon parameters, >> like direct-io (fuse' FOPEN_DIRECT_IO) and io-uring. Will add that libfuse >> during the next days. > > I'll leave the xfstests to you for now, but running some perf testing > just to verify how it's being used would be useful and help improve it > for sure. > Ah you meant performance tests. I used libfuse/example/passthrough_hp from my uring branch and then fio on top of that for reads/writes and mdtest from the ior repo for metadata. Maybe I should upload my scripts somewhere. Thanks, Beernd