On 5/28/24 5:04 PM, Jens Axboe wrote: > On 5/28/24 12:31 PM, Jens Axboe wrote: >> I suspect a bug in the previous patches, because this is what the >> forward port looks like. First, for reference, the current results: > > Got it sorted, and pinned sender and receiver on CPUs to avoid the > variation. It looks like this with the task_work approach that I sent > out as v1: > > Latencies for: Sender > percentiles (nsec): > | 1.0000th=[ 2160], 5.0000th=[ 2672], 10.0000th=[ 2768], > | 20.0000th=[ 3568], 30.0000th=[ 3568], 40.0000th=[ 3600], > | 50.0000th=[ 3600], 60.0000th=[ 3600], 70.0000th=[ 3632], > | 80.0000th=[ 3632], 90.0000th=[ 3664], 95.0000th=[ 3696], > | 99.0000th=[ 4832], 99.5000th=[15168], 99.9000th=[16192], > | 99.9500th=[16320], 99.9900th=[18304] > Latencies for: Receiver > percentiles (nsec): > | 1.0000th=[ 1528], 5.0000th=[ 1576], 10.0000th=[ 1656], > | 20.0000th=[ 2040], 30.0000th=[ 2064], 40.0000th=[ 2064], > | 50.0000th=[ 2064], 60.0000th=[ 2064], 70.0000th=[ 2096], > | 80.0000th=[ 2096], 90.0000th=[ 2128], 95.0000th=[ 2160], > | 99.0000th=[ 3472], 99.5000th=[14784], 99.9000th=[15168], > | 99.9500th=[15424], 99.9900th=[17280] > > and here's the exact same test run on the current patches: > > Latencies for: Sender > percentiles (nsec): > | 1.0000th=[ 362], 5.0000th=[ 362], 10.0000th=[ 370], > | 20.0000th=[ 370], 30.0000th=[ 370], 40.0000th=[ 370], > | 50.0000th=[ 374], 60.0000th=[ 382], 70.0000th=[ 382], > | 80.0000th=[ 382], 90.0000th=[ 382], 95.0000th=[ 390], > | 99.0000th=[ 402], 99.5000th=[ 430], 99.9000th=[ 900], > | 99.9500th=[ 972], 99.9900th=[ 1432] > Latencies for: Receiver > percentiles (nsec): > | 1.0000th=[ 1528], 5.0000th=[ 1544], 10.0000th=[ 1560], > | 20.0000th=[ 1576], 30.0000th=[ 1592], 40.0000th=[ 1592], > | 50.0000th=[ 1592], 60.0000th=[ 1608], 70.0000th=[ 1608], > | 80.0000th=[ 1640], 90.0000th=[ 1672], 95.0000th=[ 1688], > | 99.0000th=[ 1848], 99.5000th=[ 2128], 99.9000th=[14272], > | 99.9500th=[14784], 99.9900th=[73216] > > I'll try and augment the test app to do proper rated submissions, so I > can ramp up the rates a bit and see what happens. And the final one, with the rated sends sorted out. One key observation is that v1 trails the current edition, it just can't keep up as the rate is increased. If we cap the rate at at what should be 33K messages per second, v1 gets ~28K messages and has the following latency profile (for a 3 second run) Latencies for: Receiver (msg=83863) percentiles (nsec): | 1.0000th=[ 1208], 5.0000th=[ 1336], 10.0000th=[ 1400], | 20.0000th=[ 1768], 30.0000th=[ 1912], 40.0000th=[ 1976], | 50.0000th=[ 2040], 60.0000th=[ 2160], 70.0000th=[ 2256], | 80.0000th=[ 2480], 90.0000th=[ 2736], 95.0000th=[ 3024], | 99.0000th=[ 4080], 99.5000th=[ 4896], 99.9000th=[ 9664], | 99.9500th=[ 17024], 99.9900th=[218112] Latencies for: Sender (msg=83863) percentiles (nsec): | 1.0000th=[ 1928], 5.0000th=[ 2064], 10.0000th=[ 2160], | 20.0000th=[ 2608], 30.0000th=[ 2672], 40.0000th=[ 2736], | 50.0000th=[ 2864], 60.0000th=[ 2960], 70.0000th=[ 3152], | 80.0000th=[ 3408], 90.0000th=[ 4128], 95.0000th=[ 4576], | 99.0000th=[ 5920], 99.5000th=[ 6752], 99.9000th=[ 13376], | 99.9500th=[ 22912], 99.9900th=[261120] and the current edition does: Latencies for: Sender (msg=94488) percentiles (nsec): | 1.0000th=[ 181], 5.0000th=[ 191], 10.0000th=[ 201], | 20.0000th=[ 215], 30.0000th=[ 225], 40.0000th=[ 235], | 50.0000th=[ 262], 60.0000th=[ 306], 70.0000th=[ 430], | 80.0000th=[ 1004], 90.0000th=[ 2480], 95.0000th=[ 3632], | 99.0000th=[ 8096], 99.5000th=[12352], 99.9000th=[18048], | 99.9500th=[19584], 99.9900th=[23680] Latencies for: Receiver (msg=94488) percentiles (nsec): | 1.0000th=[ 342], 5.0000th=[ 398], 10.0000th=[ 482], | 20.0000th=[ 652], 30.0000th=[ 812], 40.0000th=[ 972], | 50.0000th=[ 1240], 60.0000th=[ 1640], 70.0000th=[ 1944], | 80.0000th=[ 2448], 90.0000th=[ 3248], 95.0000th=[ 5216], | 99.0000th=[10304], 99.5000th=[12352], 99.9000th=[18048], | 99.9500th=[19840], 99.9900th=[23168] If we cap it where v1 keeps up, at 13K messages per second, v1 does: Latencies for: Receiver (msg=38820) percentiles (nsec): | 1.0000th=[ 1160], 5.0000th=[ 1256], 10.0000th=[ 1352], | 20.0000th=[ 1688], 30.0000th=[ 1928], 40.0000th=[ 1976], | 50.0000th=[ 2064], 60.0000th=[ 2384], 70.0000th=[ 2480], | 80.0000th=[ 2768], 90.0000th=[ 3280], 95.0000th=[ 3472], | 99.0000th=[ 4192], 99.5000th=[ 4512], 99.9000th=[ 6624], | 99.9500th=[ 8768], 99.9900th=[14272] Latencies for: Sender (msg=38820) percentiles (nsec): | 1.0000th=[ 1848], 5.0000th=[ 1928], 10.0000th=[ 2040], | 20.0000th=[ 2608], 30.0000th=[ 2640], 40.0000th=[ 2736], | 50.0000th=[ 3024], 60.0000th=[ 3120], 70.0000th=[ 3376], | 80.0000th=[ 3824], 90.0000th=[ 4512], 95.0000th=[ 4768], | 99.0000th=[ 5536], 99.5000th=[ 6048], 99.9000th=[ 9024], | 99.9500th=[10304], 99.9900th=[23424] and v2 does: Latencies for: Sender (msg=39005) percentiles (nsec): | 1.0000th=[ 191], 5.0000th=[ 211], 10.0000th=[ 262], | 20.0000th=[ 342], 30.0000th=[ 382], 40.0000th=[ 402], | 50.0000th=[ 450], 60.0000th=[ 532], 70.0000th=[ 1080], | 80.0000th=[ 1848], 90.0000th=[ 4768], 95.0000th=[10944], | 99.0000th=[16512], 99.5000th=[18304], 99.9000th=[22400], | 99.9500th=[26496], 99.9900th=[41728] Latencies for: Receiver (msg=39005) percentiles (nsec): | 1.0000th=[ 410], 5.0000th=[ 604], 10.0000th=[ 700], | 20.0000th=[ 900], 30.0000th=[ 1128], 40.0000th=[ 1320], | 50.0000th=[ 1672], 60.0000th=[ 2256], 70.0000th=[ 2736], | 80.0000th=[ 3760], 90.0000th=[ 5408], 95.0000th=[11072], | 99.0000th=[18304], 99.5000th=[20096], 99.9000th=[24704], | 99.9500th=[27520], 99.9900th=[35584] -- Jens Axboe