On 5/28/24 12:31 PM, Jens Axboe wrote: > I suspect a bug in the previous patches, because this is what the > forward port looks like. First, for reference, the current results: Got it sorted, and pinned sender and receiver on CPUs to avoid the variation. It looks like this with the task_work approach that I sent out as v1: Latencies for: Sender percentiles (nsec): | 1.0000th=[ 2160], 5.0000th=[ 2672], 10.0000th=[ 2768], | 20.0000th=[ 3568], 30.0000th=[ 3568], 40.0000th=[ 3600], | 50.0000th=[ 3600], 60.0000th=[ 3600], 70.0000th=[ 3632], | 80.0000th=[ 3632], 90.0000th=[ 3664], 95.0000th=[ 3696], | 99.0000th=[ 4832], 99.5000th=[15168], 99.9000th=[16192], | 99.9500th=[16320], 99.9900th=[18304] Latencies for: Receiver percentiles (nsec): | 1.0000th=[ 1528], 5.0000th=[ 1576], 10.0000th=[ 1656], | 20.0000th=[ 2040], 30.0000th=[ 2064], 40.0000th=[ 2064], | 50.0000th=[ 2064], 60.0000th=[ 2064], 70.0000th=[ 2096], | 80.0000th=[ 2096], 90.0000th=[ 2128], 95.0000th=[ 2160], | 99.0000th=[ 3472], 99.5000th=[14784], 99.9000th=[15168], | 99.9500th=[15424], 99.9900th=[17280] and here's the exact same test run on the current patches: Latencies for: Sender percentiles (nsec): | 1.0000th=[ 362], 5.0000th=[ 362], 10.0000th=[ 370], | 20.0000th=[ 370], 30.0000th=[ 370], 40.0000th=[ 370], | 50.0000th=[ 374], 60.0000th=[ 382], 70.0000th=[ 382], | 80.0000th=[ 382], 90.0000th=[ 382], 95.0000th=[ 390], | 99.0000th=[ 402], 99.5000th=[ 430], 99.9000th=[ 900], | 99.9500th=[ 972], 99.9900th=[ 1432] Latencies for: Receiver percentiles (nsec): | 1.0000th=[ 1528], 5.0000th=[ 1544], 10.0000th=[ 1560], | 20.0000th=[ 1576], 30.0000th=[ 1592], 40.0000th=[ 1592], | 50.0000th=[ 1592], 60.0000th=[ 1608], 70.0000th=[ 1608], | 80.0000th=[ 1640], 90.0000th=[ 1672], 95.0000th=[ 1688], | 99.0000th=[ 1848], 99.5000th=[ 2128], 99.9000th=[14272], | 99.9500th=[14784], 99.9900th=[73216] I'll try and augment the test app to do proper rated submissions, so I can ramp up the rates a bit and see what happens. -- Jens Axboe