On 6/4/24 12:57 PM, Jens Axboe wrote: >> If the swing back is that expensive, another option is to >> allocate a new request and let the target ring to deallocate >> it once the message is delivered (similar to that overflow >> entry). > > I can give it a shot, and then run some testing. If we get close enough > with the latencies and performance, then I'd certainly be more amenable > to going either route. > > We'd definitely need to pass in the required memory and avoid the return > round trip, as that basically doubles the cost (and latency) of sending > a message. The downside of what you suggest here is that while that > should integrate nicely with existing local task_work, it'll also mean > that we'll need hot path checks for treating that request type as a > special thing. Things like req->ctx being not local, freeing the request > rather than recycling, etc. And that'll need to happen in multiple > spots. On top of that, you also need CQE memory for the other side, it's not just the req itself. Otherwise you don't know if it'll post or not, in case of low memory situations. I dunno, I feel like this solution would get a lot more complicated than it is now, rather than make it simpler. -- Jens Axboe