On 12/07/2020 23:29, Jens Axboe wrote: > On 7/12/20 11:42 AM, Pavel Begunkov wrote: >> io_kiocb::task_work was de-unionised, and is not planned to be shared >> back, because it's too useful and commonly used. Hence, instead of >> keeping a separate task_work in struct io_async_rw just reuse >> req->task_work. > > This is a good idea, req->task_work is a first class citizen these days. > Unfortunately it doesn't do much good for io_async_ctx, since it's so > huge with the msghdr related bits. It'd be nice to do something about > that too, though not a huge priority as allocating async context is We can allocate not an entire struct/union io_async_ctx but its particular member. Should be a bit better for writes. And if we can save another 16B in io_async_rw, it'd be 3 cachelines for io_async_rw. E.g. there are two 4B holes in struct wait_page_queue, one is from "int bit_nr", the second is inside "wait_queue_entry_t wait". # pahole -C io_async_ctx ./fs/io_uring.o struct io_async_ctx { union { struct io_async_rw rw; /* 0 208 */ struct io_async_msghdr msg; /* 0 368 */ struct io_async_connect connect; /* 0 128 */ struct io_timeout_data timeout __attribute__((__aligned__(8))); /* 0 96 */ } __attribute__((__aligned__(8))); /* 0 368 */ /* size: 368, cachelines: 6, members: 1 */ /* forced alignments: 1 */ /* last cacheline: 48 bytes */ } __attribute__((__aligned__(8))); > somewhat of a slow path. Though with the proliferation of task_work, > it's no longer nearly as expensive as it used to be with the async > thread offload. Could be argued to be a full-on fast path these days. > > Applied, thanks. > -- Pavel Begunkov