Re: [PATCH RFC v3 00/17] fuse: fuse-over-io-uring

Jens Axboe <axboe@xxxxxxxxx> · Wed, 4 Sep 2024 13:41:51 -0600

On 9/4/24 1:37 PM, Bernd Schubert wrote:
> 
> 
> On 9/4/24 18:42, Jens Axboe wrote:
>> Overall I think this looks pretty reasonable from an io_uring point of
>> view. Some minor comments in the replies that would need to get
>> resolved, and we'll need to get Ming's buffer work done to reap the dio
>> benefits.
>>
>> I ran a quick benchmark here, doing 4k buffered random reads from a big
>> file. I see about 25% improvement for that case, and notably at half the
>> CPU usage.
> 
> That is a bit low for my needs, but you will definitely need to wake up on 
> the same core - not applied in this patch version. I also need to re-test
>  with current kernel versions, but I think even that is not perfect. 
> 
> We had a rather long discussion here
> https://lore.kernel.org/lkml/d9151806-c63a-c1da-12ad-c9c1c7039785@xxxxxxx/T/#r58884ee2c68f9ac5fdb89c4e3a968007ff08468e
> and there is a seesaw hack, which makes it work perfectly. 
> Then got persistently distracted with other work - so far I didn't track down yet why 
> __wake_up_on_current_cpu didn't work. Back that time it was also only still
> patch and not in linux yet. I need to retest and possible figure out where
> the task switch happens.

I'll give it a look, wasn't too worried about it as we're also still
missing the zero copy bits. More concerned with just getting the core of
it sane, which I think we're pretty close to. Then we can work on making
it even faster post that.

> Also, if you are testing with with buffered writes, 
> v2 series had more optimization, like a core+1 hack for async IO.
> I think in order to get it landed and to agree on the approach with
> Miklos it is better to first remove all these optimizations and then
> fix it later... Though for performance testing it is not optimal.

Exactly, that's why I objected to some of the v2 io_uring hackery that
just wasn't palatable.

-- 
Jens Axboe