On 10/11/24 20:39, Jens Axboe wrote: > On 10/11/24 12:35 PM, Bernd Schubert wrote: >> On 10/11/24 19:57, Jens Axboe wrote: >>> On 10/10/24 2:56 PM, Bernd Schubert wrote: >>>> Hello, >>>> >>>> as discussed during LPC, we would like to have large CQE sizes, at least >>>> 256B. Ideally 256B for fuse, but CQE512 might be a bit too much... >>>> >>>> Pavel said that this should be ok, but it would be better to have the CQE >>>> size as function argument. >>>> Could you give me some hints how this should look like and especially how >>>> we are going to communicate the CQE size to the kernel? I guess just adding >>>> IORING_SETUP_CQE256 / IORING_SETUP_CQE512 would be much easier. >>> >>> Not Pavel and unfortunately I could not be at that LPC discussion, but >>> yeah I don't see why not just adding the necessary SETUP arg for this >>> would not be the way to go. As long as they are power-of-2, then all >>> it'll impact on both the kernel and liburing side is what size shift to >>> use when iterating CQEs. >> >> Thanks, Pavel also wanted power-of-2, although 512 is a bit much for fuse. >> Well, maybe 256 will be sufficient. Going to look into adding that parameter >> during the next days. > > We really have to keep it pow-of-2 just to avoid convoluting the logic > (and overhead) of iterating the CQ ring and CQEs. You can search for > IORING_SETUP_CQE32 in the kernel to see how it's just a shift, and ditto > on the liburing side. Thanks, going to look into it. > > Curious, what's all the space needed for? The basic fuse header: struct fuse_in_header -> current 40B and per request header headers, I think current max is 64. And then some extra compat space for both, so that they can be safely extended in the future (which is currently an issue). > >>> Since this obviously means larger CQ rings, one nice side effect is that >>> since 6.10 we don't need contig pages to map any of the rings. So should >>> work just fine regardless of memory fragmentation, where previously that >>> would've been a concern. >>> >> >> Out of interest, what is the change? Up to fuse-io-uring rfc2 I was >> vmalloced buffers for fuse that got mmaped - was working fine. Miklos just >> wants to avoid that kernel allocates large chunks of memory on behalf of >> users. > > It was the change that got rid of remap_pfn_range() for mapping, and > switched to vm_insert_page(s) instead. Memory overhead should generally > not be too bad, it's all about sizing the rings appropriately. The much > bigger concern is needing contig memory, as that can become scarce after > longer uptimes, even with plenty of memory free. This is particularly > important if you need 512b CQEs, obviously. > For sure, I was just curious what you had changed. I think I had looked into that io-uring code around 2 years ago. Going to look into the update io-uring code, thanks for the hint. For fuse I was just using remap_vmalloc_range(). https://lore.kernel.org/all/20240529-fuse-uring-for-6-9-rfc2-out-v1-7-d149476b1d65@xxxxxxx/ Thanks, Bernd