On 10/12/24 16:38, Jens Axboe wrote: > On 10/11/24 7:55 PM, Ming Lei wrote: >> On Fri, Oct 11, 2024 at 4:56?AM Bernd Schubert >> <bernd.schubert@xxxxxxxxxxx> wrote: >>> >>> Hello, >>> >>> as discussed during LPC, we would like to have large CQE sizes, at least >>> 256B. Ideally 256B for fuse, but CQE512 might be a bit too much... >>> >>> Pavel said that this should be ok, but it would be better to have the CQE >>> size as function argument. >>> Could you give me some hints how this should look like and especially how >>> we are going to communicate the CQE size to the kernel? I guess just adding >>> IORING_SETUP_CQE256 / IORING_SETUP_CQE512 would be much easier. >>> >>> I'm basically through with other changes Miklos had been asking for and >>> moving fuse headers into the CQE is next. >> >> Big CQE may not be efficient, there are copy from kernel to CQE and >> from CQE to userspace. And not flexible, it is one ring-wide property, >> if it is big, >> any CQE from this ring has to be big. > > There isn't really a copy - the kernel fills it in, generally the > application itself, just in the kernel, and then the application can > read it on that side. It's the same memory, and it'll also generally be > cache hot when the applicatio reaps it. Unless a lot of time has passed, > obviously. > > That said, yeah bigger sqe/cqe is less ideal than smaller ones, > obviously. Currently you can fit 4 normal cqes in a cache line, or a > single sqe. Making either of them bigger will obviously bloat that. > >> If you are saying uring_cmd, another way is to mapped one area for >> this purpose, the fuse driver can write fuse headers to this indexed >> mmap buffer, and userspace read it, which is just efficient, without >> io_uring core changes. ublk uses this way to fill IO request header. >> But it requires each command to have a unique tag. > > That may indeed be a decent idea for this too. You don't even need fancy > tagging, you can just use the cqe index for your tag too, as it should > not be bigger than the the cq ring space. Then you can get away with > just using normal cqe sizes, and just have a shared region between the > two where data gets written by the uring_cmd completion, and the app can > access it directly from userspace. Would be good if Miklos could chime in here, adding back mmap for headers wouldn't be difficult, but would add back more fuse-uring startup and tear-down code.