On 5/30/24 16:23, Jens Axboe wrote:
Hi, For v1 and replies to that and tons of perf measurements, go here:
I'd really prefer the task_work version rather than carving yet another path specific to msg_ring. Perf might sounds better, but it's duplicating wake up paths, not integrated with batch waiting, not clear how affects different workloads with target locking and would work weird in terms of ordering. If the swing back is that expensive, another option is to allocate a new request and let the target ring to deallocate it once the message is delivered (similar to that overflow entry).
https://lore.kernel.org/io-uring/3d553205-0fe2-482e-8d4c-a4a1ad278893@xxxxxxxxx/T/#m12f44c0a9ee40a59b0dcc226e22a0d031903aa73 as I won't duplicate them in here. Performance has been improved since v1 as well, as the slab accounting is gone and we now rely soly on the completion_lock on the issuer side. Changes since v1: - Change commit messages to reflect it's DEFER_TASKRUN, not SINGLE_ISSUER - Get rid of the need to double lock on the target uring_lock - Relax the check for needing remote posting, and then finally kill it - Unify it across ring types - Kill (now) unused callback_head in io_msg - Add overflow caching to avoid __GFP_ACCOUNT overhead - Rebase on current git master with 6.9 and 6.10 fixes pulled in
-- Pavel Begunkov