On 10/11/24 12:35 PM, Bernd Schubert wrote: > On 10/11/24 19:57, Jens Axboe wrote: >> On 10/10/24 2:56 PM, Bernd Schubert wrote: >>> Hello, >>> >>> as discussed during LPC, we would like to have large CQE sizes, at least >>> 256B. Ideally 256B for fuse, but CQE512 might be a bit too much... >>> >>> Pavel said that this should be ok, but it would be better to have the CQE >>> size as function argument. >>> Could you give me some hints how this should look like and especially how >>> we are going to communicate the CQE size to the kernel? I guess just adding >>> IORING_SETUP_CQE256 / IORING_SETUP_CQE512 would be much easier. >> >> Not Pavel and unfortunately I could not be at that LPC discussion, but >> yeah I don't see why not just adding the necessary SETUP arg for this >> would not be the way to go. As long as they are power-of-2, then all >> it'll impact on both the kernel and liburing side is what size shift to >> use when iterating CQEs. > > Thanks, Pavel also wanted power-of-2, although 512 is a bit much for fuse. > Well, maybe 256 will be sufficient. Going to look into adding that parameter > during the next days. We really have to keep it pow-of-2 just to avoid convoluting the logic (and overhead) of iterating the CQ ring and CQEs. You can search for IORING_SETUP_CQE32 in the kernel to see how it's just a shift, and ditto on the liburing side. Curious, what's all the space needed for? >> Since this obviously means larger CQ rings, one nice side effect is that >> since 6.10 we don't need contig pages to map any of the rings. So should >> work just fine regardless of memory fragmentation, where previously that >> would've been a concern. >> > > Out of interest, what is the change? Up to fuse-io-uring rfc2 I was > vmalloced buffers for fuse that got mmaped - was working fine. Miklos just > wants to avoid that kernel allocates large chunks of memory on behalf of > users. It was the change that got rid of remap_pfn_range() for mapping, and switched to vm_insert_page(s) instead. Memory overhead should generally not be too bad, it's all about sizing the rings appropriately. The much bigger concern is needing contig memory, as that can become scarce after longer uptimes, even with plenty of memory free. This is particularly important if you need 512b CQEs, obviously. -- Jens Axboe