Re: Large CQE for fuse headers

Bernd Schubert <bernd.schubert@xxxxxxxxxxx> · Fri, 11 Oct 2024 21:03:34 +0200

On 10/11/24 20:39, Jens Axboe wrote:
> On 10/11/24 12:35 PM, Bernd Schubert wrote:
>> On 10/11/24 19:57, Jens Axboe wrote:
>>> On 10/10/24 2:56 PM, Bernd Schubert wrote:
>>>> Hello,
>>>>
>>>> as discussed during LPC, we would like to have large CQE sizes, at least
>>>> 256B. Ideally 256B for fuse, but CQE512 might be a bit too much...
>>>>
>>>> Pavel said that this should be ok, but it would be better to have the CQE
>>>> size as function argument. 
>>>> Could you give me some hints how this should look like and especially how
>>>> we are going to communicate the CQE size to the kernel? I guess just adding
>>>> IORING_SETUP_CQE256 / IORING_SETUP_CQE512 would be much easier.
>>>
>>> Not Pavel and unfortunately I could not be at that LPC discussion, but
>>> yeah I don't see why not just adding the necessary SETUP arg for this
>>> would not be the way to go. As long as they are power-of-2, then all
>>> it'll impact on both the kernel and liburing side is what size shift to
>>> use when iterating CQEs.
>>
>> Thanks, Pavel also wanted power-of-2, although 512 is a bit much for fuse. 
>> Well, maybe 256 will be sufficient. Going to look into adding that parameter
>> during the next days.
> 
> We really have to keep it pow-of-2 just to avoid convoluting the logic
> (and overhead) of iterating the CQ ring and CQEs. You can search for
> IORING_SETUP_CQE32 in the kernel to see how it's just a shift, and ditto
> on the liburing side.

Thanks, going to look into it.

> 
> Curious, what's all the space needed for?

The basic fuse header: struct fuse_in_header -> current 40B
and per request header headers, I think current max is 64.

And then some extra compat space for both, so that they can be safely
extended in the future (which is currently an issue).

> 
>>> Since this obviously means larger CQ rings, one nice side effect is that
>>> since 6.10 we don't need contig pages to map any of the rings. So should
>>> work just fine regardless of memory fragmentation, where previously that
>>> would've been a concern.
>>>
>>
>> Out of interest, what is the change? Up to fuse-io-uring rfc2 I was
>> vmalloced buffers for fuse that got mmaped - was working fine. Miklos just
>> wants to avoid that kernel allocates large chunks of memory on behalf of
>> users.
> 
> It was the change that got rid of remap_pfn_range() for mapping, and
> switched to vm_insert_page(s) instead. Memory overhead should generally
> not be too bad, it's all about sizing the rings appropriately. The much
> bigger concern is needing contig memory, as that can become scarce after
> longer uptimes, even with plenty of memory free. This is particularly
> important if you need 512b CQEs, obviously.
> 

For sure, I was just curious what you had changed. I think I had looked into
that io-uring code around 2 years ago.  Going to look into the update
io-uring code, thanks for the hint.
For fuse I was just using remap_vmalloc_range().

https://lore.kernel.org/all/20240529-fuse-uring-for-6-9-rfc2-out-v1-7-d149476b1d65@xxxxxxx/

Thanks,
Bernd