On 11/26/24 8:07 AM, lizetao wrote: > Hi, > >> On 11/23/24 5:23 AM, lizetao wrote: >>> Hi >>> >>>>> On 11/19/24 1:12 AM, lizetao wrote: >>>>> Adds support for doing chmod through io_uring. IORING_OP_FCHMOD >>>>> behaves like fchmod(2) and takes the same arguments. >>> >>>> Looks pretty straight forward. The only downside is the forced use >>>> of REQ_F_FORCE_ASYNC - did you look into how feasible it would be >>>> to allow non-blocking issue of this? Would imagine the majority of >>>> fchmod calls end up not blocking in the first place. >>> >>> Yes, I considered fchmod to allow asynchronous execution and wrote a >>> test case to test it, the results are as follows: >>> >>> fchmod: >>> real 0m1.413s >>> user 0m0.253s >>> sys 0m1.079s >>> >>> io_uring + fchmod: >>> real 0m1.268s >>> user 0m0.015s >>> sys 0m5.739s >>> >>> There is about a 10% improvement. > >> And that makes sense if you're keeping some fchmod inflight, as you'd >> generally just have one io-wq processing them and running things in >> parallel with submission. But what you you keep an indepth count of >> 1, eg do sync fchmod? Then it'd be considerably slower than the >> syscall. > > Indeed, When performing REQ_F_FORCE_ASYNC operations at depth 1, > performance is degraded. The results are as follows: > > fchmod: > real 0m2.285s > user 0m0.050s > sys 0m1.996s > > io_uring + fchmod: > real 0m2.541s > user 0m0.013s > sys 0m2.379s That's what I expected. But actually looks like io-wq does a good job in this case, that's pretty close. >> This isn't necessarily something to worry about, but fact is that if >> you can do a nonblock issue and have it succeed most of the time, >> that'll be more efficient (and faster for low/sync fchmod) than >> something that just offloads to io-wq. You can see that from your >> results too, comparing the sys number netween the two. > > However, when I remove REQ_F_FORCE_ASYNC and use IO_URING_F_NONBLOCK, > the performance is not improved. The measured results are as follows: > fchmod: > real 0m2.132s > user 0m0.048s > sys 0m1.845s > > io_uring + fchmod: > real 0m2.196s > user 0m0.005s > sys 0m2.097s You would not expect it to be faster, as it's really just doing the same work through a different mechanism. I'd expect that to roughly be within normal variance, and if you're not doing a submit_and_wait mechanism (eg you're doing submit and wait separately, hence doing 2 syscalls for each fchmod), then that likely explains the discrepancy, if there is any. And you'd also need to actually be able to remove REQ_F_FORCE_ASYNC to have this as something that could be included. Otherwise if vfs_fchmod() blocks, then you're now stalling the whole pipeline. Removing it just as a test is fine, as you did. >> Hence why I'm asking if you looked into doing a nonblocking issue at >> all. This won't necessarily gate the inclusion of the patch, and it >> is something that can be changed down the line, I'm mostly just >> curious. > > Does this result meet expectations? Or maybe I missed something, > please let me know Yep that looks like I expected. io-wq offload will be fine if you're doing a bunch of fchmod, in fact it'll probably end up being faster as you reported. But if you're doing single (or few) fchmod at the time, then io-wq offload will be a bit slower. -- Jens Axboe