On 9/4/24 18:42, Jens Axboe wrote: > Overall I think this looks pretty reasonable from an io_uring point of > view. Some minor comments in the replies that would need to get > resolved, and we'll need to get Ming's buffer work done to reap the dio > benefits. > > I ran a quick benchmark here, doing 4k buffered random reads from a big > file. I see about 25% improvement for that case, and notably at half the > CPU usage. That is a bit low for my needs, but you will definitely need to wake up on the same core - not applied in this patch version. I also need to re-test with current kernel versions, but I think even that is not perfect. We had a rather long discussion here https://lore.kernel.org/lkml/d9151806-c63a-c1da-12ad-c9c1c7039785@xxxxxxx/T/#r58884ee2c68f9ac5fdb89c4e3a968007ff08468e and there is a seesaw hack, which makes it work perfectly. Then got persistently distracted with other work - so far I didn't track down yet why __wake_up_on_current_cpu didn't work. Back that time it was also only still patch and not in linux yet. I need to retest and possible figure out where the task switch happens. Also, if you are testing with with buffered writes, v2 series had more optimization, like a core+1 hack for async IO. I think in order to get it landed and to agree on the approach with Miklos it is better to first remove all these optimizations and then fix it later... Though for performance testing it is not optimal. Thanks, Bernd