Re: [PATCHSET 0/3] Improve MSG_RING SINGLE_ISSUER performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/29/24 02:35, Jens Axboe wrote:
On 5/28/24 5:04 PM, Jens Axboe wrote:
On 5/28/24 12:31 PM, Jens Axboe wrote:
I suspect a bug in the previous patches, because this is what the
forward port looks like. First, for reference, the current results:

Got it sorted, and pinned sender and receiver on CPUs to avoid the
variation. It looks like this with the task_work approach that I sent
out as v1:

Latencies for: Sender
     percentiles (nsec):
      |  1.0000th=[ 2160],  5.0000th=[ 2672], 10.0000th=[ 2768],
      | 20.0000th=[ 3568], 30.0000th=[ 3568], 40.0000th=[ 3600],
      | 50.0000th=[ 3600], 60.0000th=[ 3600], 70.0000th=[ 3632],
      | 80.0000th=[ 3632], 90.0000th=[ 3664], 95.0000th=[ 3696],
      | 99.0000th=[ 4832], 99.5000th=[15168], 99.9000th=[16192],
      | 99.9500th=[16320], 99.9900th=[18304]
Latencies for: Receiver
     percentiles (nsec):
      |  1.0000th=[ 1528],  5.0000th=[ 1576], 10.0000th=[ 1656],
      | 20.0000th=[ 2040], 30.0000th=[ 2064], 40.0000th=[ 2064],
      | 50.0000th=[ 2064], 60.0000th=[ 2064], 70.0000th=[ 2096],
      | 80.0000th=[ 2096], 90.0000th=[ 2128], 95.0000th=[ 2160],
      | 99.0000th=[ 3472], 99.5000th=[14784], 99.9000th=[15168],
      | 99.9500th=[15424], 99.9900th=[17280]

and here's the exact same test run on the current patches:

Latencies for: Sender
     percentiles (nsec):
      |  1.0000th=[  362],  5.0000th=[  362], 10.0000th=[  370],
      | 20.0000th=[  370], 30.0000th=[  370], 40.0000th=[  370],
      | 50.0000th=[  374], 60.0000th=[  382], 70.0000th=[  382],
      | 80.0000th=[  382], 90.0000th=[  382], 95.0000th=[  390],
      | 99.0000th=[  402], 99.5000th=[  430], 99.9000th=[  900],
      | 99.9500th=[  972], 99.9900th=[ 1432]
Latencies for: Receiver
     percentiles (nsec):
      |  1.0000th=[ 1528],  5.0000th=[ 1544], 10.0000th=[ 1560],
      | 20.0000th=[ 1576], 30.0000th=[ 1592], 40.0000th=[ 1592],
      | 50.0000th=[ 1592], 60.0000th=[ 1608], 70.0000th=[ 1608],
      | 80.0000th=[ 1640], 90.0000th=[ 1672], 95.0000th=[ 1688],
      | 99.0000th=[ 1848], 99.5000th=[ 2128], 99.9000th=[14272],
      | 99.9500th=[14784], 99.9900th=[73216]

I'll try and augment the test app to do proper rated submissions, so I
can ramp up the rates a bit and see what happens.

And the final one, with the rated sends sorted out. One key observation
is that v1 trails the current edition, it just can't keep up as the rate
is increased. If we cap the rate at at what should be 33K messages per
second, v1 gets ~28K messages and has the following latency profile (for
a 3 second run)

Do you see where the receiver latency comes from? The wakeups are
quite similar in nature, assuming it's all wait(nr=1) and CPUs
are not 100% consumed. The hop back spoils scheduling timing?


Latencies for: Receiver (msg=83863)
     percentiles (nsec):
      |  1.0000th=[  1208],  5.0000th=[  1336], 10.0000th=[  1400],
      | 20.0000th=[  1768], 30.0000th=[  1912], 40.0000th=[  1976],
      | 50.0000th=[  2040], 60.0000th=[  2160], 70.0000th=[  2256],
      | 80.0000th=[  2480], 90.0000th=[  2736], 95.0000th=[  3024],
      | 99.0000th=[  4080], 99.5000th=[  4896], 99.9000th=[  9664],
      | 99.9500th=[ 17024], 99.9900th=[218112]
Latencies for: Sender (msg=83863)
     percentiles (nsec):
      |  1.0000th=[  1928],  5.0000th=[  2064], 10.0000th=[  2160],
      | 20.0000th=[  2608], 30.0000th=[  2672], 40.0000th=[  2736],
      | 50.0000th=[  2864], 60.0000th=[  2960], 70.0000th=[  3152],
      | 80.0000th=[  3408], 90.0000th=[  4128], 95.0000th=[  4576],
      | 99.0000th=[  5920], 99.5000th=[  6752], 99.9000th=[ 13376],
      | 99.9500th=[ 22912], 99.9900th=[261120]

and the current edition does:

Latencies for: Sender (msg=94488)
     percentiles (nsec):
      |  1.0000th=[  181],  5.0000th=[  191], 10.0000th=[  201],
      | 20.0000th=[  215], 30.0000th=[  225], 40.0000th=[  235],
      | 50.0000th=[  262], 60.0000th=[  306], 70.0000th=[  430],
      | 80.0000th=[ 1004], 90.0000th=[ 2480], 95.0000th=[ 3632],
      | 99.0000th=[ 8096], 99.5000th=[12352], 99.9000th=[18048],
      | 99.9500th=[19584], 99.9900th=[23680]
Latencies for: Receiver (msg=94488)
     percentiles (nsec):
      |  1.0000th=[  342],  5.0000th=[  398], 10.0000th=[  482],
      | 20.0000th=[  652], 30.0000th=[  812], 40.0000th=[  972],
      | 50.0000th=[ 1240], 60.0000th=[ 1640], 70.0000th=[ 1944],
      | 80.0000th=[ 2448], 90.0000th=[ 3248], 95.0000th=[ 5216],
      | 99.0000th=[10304], 99.5000th=[12352], 99.9000th=[18048],
      | 99.9500th=[19840], 99.9900th=[23168]

If we cap it where v1 keeps up, at 13K messages per second, v1 does:

Latencies for: Receiver (msg=38820)
     percentiles (nsec):
      |  1.0000th=[ 1160],  5.0000th=[ 1256], 10.0000th=[ 1352],
      | 20.0000th=[ 1688], 30.0000th=[ 1928], 40.0000th=[ 1976],
      | 50.0000th=[ 2064], 60.0000th=[ 2384], 70.0000th=[ 2480],
      | 80.0000th=[ 2768], 90.0000th=[ 3280], 95.0000th=[ 3472],
      | 99.0000th=[ 4192], 99.5000th=[ 4512], 99.9000th=[ 6624],
      | 99.9500th=[ 8768], 99.9900th=[14272]
Latencies for: Sender (msg=38820)
     percentiles (nsec):
      |  1.0000th=[ 1848],  5.0000th=[ 1928], 10.0000th=[ 2040],
      | 20.0000th=[ 2608], 30.0000th=[ 2640], 40.0000th=[ 2736],
      | 50.0000th=[ 3024], 60.0000th=[ 3120], 70.0000th=[ 3376],
      | 80.0000th=[ 3824], 90.0000th=[ 4512], 95.0000th=[ 4768],
      | 99.0000th=[ 5536], 99.5000th=[ 6048], 99.9000th=[ 9024],
      | 99.9500th=[10304], 99.9900th=[23424]

and v2 does:

Latencies for: Sender (msg=39005)
     percentiles (nsec):
      |  1.0000th=[  191],  5.0000th=[  211], 10.0000th=[  262],
      | 20.0000th=[  342], 30.0000th=[  382], 40.0000th=[  402],
      | 50.0000th=[  450], 60.0000th=[  532], 70.0000th=[ 1080],
      | 80.0000th=[ 1848], 90.0000th=[ 4768], 95.0000th=[10944],
      | 99.0000th=[16512], 99.5000th=[18304], 99.9000th=[22400],
      | 99.9500th=[26496], 99.9900th=[41728]
Latencies for: Receiver (msg=39005)
     percentiles (nsec):
      |  1.0000th=[  410],  5.0000th=[  604], 10.0000th=[  700],
      | 20.0000th=[  900], 30.0000th=[ 1128], 40.0000th=[ 1320],
      | 50.0000th=[ 1672], 60.0000th=[ 2256], 70.0000th=[ 2736],
      | 80.0000th=[ 3760], 90.0000th=[ 5408], 95.0000th=[11072],
      | 99.0000th=[18304], 99.5000th=[20096], 99.9000th=[24704],
      | 99.9500th=[27520], 99.9900th=[35584]


--
Pavel Begunkov




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux