On 5/28/24 5:04 PM, Jens Axboe wrote:
On 5/28/24 12:31 PM, Jens Axboe wrote:
I suspect a bug in the previous patches, because this is what the
forward port looks like. First, for reference, the current results:
Got it sorted, and pinned sender and receiver on CPUs to avoid the
variation. It looks like this with the task_work approach that I sent
out as v1:
Latencies for: Sender
percentiles (nsec):
| 1.0000th=[ 2160], 5.0000th=[ 2672], 10.0000th=[ 2768],
| 20.0000th=[ 3568], 30.0000th=[ 3568], 40.0000th=[ 3600],
| 50.0000th=[ 3600], 60.0000th=[ 3600], 70.0000th=[ 3632],
| 80.0000th=[ 3632], 90.0000th=[ 3664], 95.0000th=[ 3696],
| 99.0000th=[ 4832], 99.5000th=[15168], 99.9000th=[16192],
| 99.9500th=[16320], 99.9900th=[18304]
Latencies for: Receiver
percentiles (nsec):
| 1.0000th=[ 1528], 5.0000th=[ 1576], 10.0000th=[ 1656],
| 20.0000th=[ 2040], 30.0000th=[ 2064], 40.0000th=[ 2064],
| 50.0000th=[ 2064], 60.0000th=[ 2064], 70.0000th=[ 2096],
| 80.0000th=[ 2096], 90.0000th=[ 2128], 95.0000th=[ 2160],
| 99.0000th=[ 3472], 99.5000th=[14784], 99.9000th=[15168],
| 99.9500th=[15424], 99.9900th=[17280]
and here's the exact same test run on the current patches:
Latencies for: Sender
percentiles (nsec):
| 1.0000th=[ 362], 5.0000th=[ 362], 10.0000th=[ 370],
| 20.0000th=[ 370], 30.0000th=[ 370], 40.0000th=[ 370],
| 50.0000th=[ 374], 60.0000th=[ 382], 70.0000th=[ 382],
| 80.0000th=[ 382], 90.0000th=[ 382], 95.0000th=[ 390],
| 99.0000th=[ 402], 99.5000th=[ 430], 99.9000th=[ 900],
| 99.9500th=[ 972], 99.9900th=[ 1432]
Latencies for: Receiver
percentiles (nsec):
| 1.0000th=[ 1528], 5.0000th=[ 1544], 10.0000th=[ 1560],
| 20.0000th=[ 1576], 30.0000th=[ 1592], 40.0000th=[ 1592],
| 50.0000th=[ 1592], 60.0000th=[ 1608], 70.0000th=[ 1608],
| 80.0000th=[ 1640], 90.0000th=[ 1672], 95.0000th=[ 1688],
| 99.0000th=[ 1848], 99.5000th=[ 2128], 99.9000th=[14272],
| 99.9500th=[14784], 99.9900th=[73216]
I'll try and augment the test app to do proper rated submissions, so I
can ramp up the rates a bit and see what happens.
And the final one, with the rated sends sorted out. One key observation
is that v1 trails the current edition, it just can't keep up as the rate
is increased. If we cap the rate at at what should be 33K messages per
second, v1 gets ~28K messages and has the following latency profile (for
a 3 second run)
Latencies for: Receiver (msg=83863)
percentiles (nsec):
| 1.0000th=[ 1208], 5.0000th=[ 1336], 10.0000th=[ 1400],
| 20.0000th=[ 1768], 30.0000th=[ 1912], 40.0000th=[ 1976],
| 50.0000th=[ 2040], 60.0000th=[ 2160], 70.0000th=[ 2256],
| 80.0000th=[ 2480], 90.0000th=[ 2736], 95.0000th=[ 3024],
| 99.0000th=[ 4080], 99.5000th=[ 4896], 99.9000th=[ 9664],
| 99.9500th=[ 17024], 99.9900th=[218112]
Latencies for: Sender (msg=83863)
percentiles (nsec):
| 1.0000th=[ 1928], 5.0000th=[ 2064], 10.0000th=[ 2160],
| 20.0000th=[ 2608], 30.0000th=[ 2672], 40.0000th=[ 2736],
| 50.0000th=[ 2864], 60.0000th=[ 2960], 70.0000th=[ 3152],
| 80.0000th=[ 3408], 90.0000th=[ 4128], 95.0000th=[ 4576],
| 99.0000th=[ 5920], 99.5000th=[ 6752], 99.9000th=[ 13376],
| 99.9500th=[ 22912], 99.9900th=[261120]
and the current edition does:
Latencies for: Sender (msg=94488)
percentiles (nsec):
| 1.0000th=[ 181], 5.0000th=[ 191], 10.0000th=[ 201],
| 20.0000th=[ 215], 30.0000th=[ 225], 40.0000th=[ 235],
| 50.0000th=[ 262], 60.0000th=[ 306], 70.0000th=[ 430],
| 80.0000th=[ 1004], 90.0000th=[ 2480], 95.0000th=[ 3632],
| 99.0000th=[ 8096], 99.5000th=[12352], 99.9000th=[18048],
| 99.9500th=[19584], 99.9900th=[23680]
Latencies for: Receiver (msg=94488)
percentiles (nsec):
| 1.0000th=[ 342], 5.0000th=[ 398], 10.0000th=[ 482],
| 20.0000th=[ 652], 30.0000th=[ 812], 40.0000th=[ 972],
| 50.0000th=[ 1240], 60.0000th=[ 1640], 70.0000th=[ 1944],
| 80.0000th=[ 2448], 90.0000th=[ 3248], 95.0000th=[ 5216],
| 99.0000th=[10304], 99.5000th=[12352], 99.9000th=[18048],
| 99.9500th=[19840], 99.9900th=[23168]
If we cap it where v1 keeps up, at 13K messages per second, v1 does:
Latencies for: Receiver (msg=38820)
percentiles (nsec):
| 1.0000th=[ 1160], 5.0000th=[ 1256], 10.0000th=[ 1352],
| 20.0000th=[ 1688], 30.0000th=[ 1928], 40.0000th=[ 1976],
| 50.0000th=[ 2064], 60.0000th=[ 2384], 70.0000th=[ 2480],
| 80.0000th=[ 2768], 90.0000th=[ 3280], 95.0000th=[ 3472],
| 99.0000th=[ 4192], 99.5000th=[ 4512], 99.9000th=[ 6624],
| 99.9500th=[ 8768], 99.9900th=[14272]
Latencies for: Sender (msg=38820)
percentiles (nsec):
| 1.0000th=[ 1848], 5.0000th=[ 1928], 10.0000th=[ 2040],
| 20.0000th=[ 2608], 30.0000th=[ 2640], 40.0000th=[ 2736],
| 50.0000th=[ 3024], 60.0000th=[ 3120], 70.0000th=[ 3376],
| 80.0000th=[ 3824], 90.0000th=[ 4512], 95.0000th=[ 4768],
| 99.0000th=[ 5536], 99.5000th=[ 6048], 99.9000th=[ 9024],
| 99.9500th=[10304], 99.9900th=[23424]
and v2 does:
Latencies for: Sender (msg=39005)
percentiles (nsec):
| 1.0000th=[ 191], 5.0000th=[ 211], 10.0000th=[ 262],
| 20.0000th=[ 342], 30.0000th=[ 382], 40.0000th=[ 402],
| 50.0000th=[ 450], 60.0000th=[ 532], 70.0000th=[ 1080],
| 80.0000th=[ 1848], 90.0000th=[ 4768], 95.0000th=[10944],
| 99.0000th=[16512], 99.5000th=[18304], 99.9000th=[22400],
| 99.9500th=[26496], 99.9900th=[41728]
Latencies for: Receiver (msg=39005)
percentiles (nsec):
| 1.0000th=[ 410], 5.0000th=[ 604], 10.0000th=[ 700],
| 20.0000th=[ 900], 30.0000th=[ 1128], 40.0000th=[ 1320],
| 50.0000th=[ 1672], 60.0000th=[ 2256], 70.0000th=[ 2736],
| 80.0000th=[ 3760], 90.0000th=[ 5408], 95.0000th=[11072],
| 99.0000th=[18304], 99.5000th=[20096], 99.9000th=[24704],
| 99.9500th=[27520], 99.9900th=[35584]