Re: [PATCH v2 for-next 0/8] io_uring: tw contention improvments

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/22/22 23:21, Jens Axboe wrote:
On 6/22/22 7:40 AM, Dylan Yudaken wrote:
Task work currently uses a spin lock to guard task_list and
task_running. Some use cases such as networking can trigger task_work_add
from multiple threads all at once, which suffers from contention here.

This can be changed to use a lockless list which seems to have better
performance. Running the micro benchmark in [1] I see 20% improvment in
multithreaded task work add. It required removing the priority tw list
optimisation, however it isn't clear how important that optimisation is.
Additionally it has fairly easy to break semantics.

Patch 1-2 remove the priority tw list optimisation
Patch 3-5 add lockless lists for task work
Patch 6 fixes a bug I noticed in io_uring event tracing
Patch 7-8 adds tracing for task_work_run

I ran some IRQ driven workloads on this. Basic 512b random read, DIO,
IRQ, and then at queue depths 1-64, doubling every time. Once we get to
QD=8, start doing submit/complete batch of 1/4th of the QD so we ramp up
there too. Results below, first set is 5.19-rc3 + for-5.20/io_uring,
second set is that plus this series.

This is what I ran:

sudo taskset -c 12 t/io_uring -d<QD> -b512 -s<batch> -c<batch> -p0 -F1 -B1 -n1 -D0 -R0 -X1 -R1 -t1 -r5 /dev/nvme0n1

on a gen2 optane drive.

tldr - looks like an improvement there too, and no ill effects seen on
latency.

Looks so, nice.


5.19-rc3 + for-5.20/io_uring:

QD1, Batch=1
Maximum IOPS=244K
1509: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 3996],  5.0000th=[ 3996], 10.0000th=[ 3996],
      | 20.0000th=[ 4036], 30.0000th=[ 4036], 40.0000th=[ 4036],
      | 50.0000th=[ 4036], 60.0000th=[ 4036], 70.0000th=[ 4036],
      | 80.0000th=[ 4076], 90.0000th=[ 4116], 95.0000th=[ 4196],
      | 99.0000th=[ 4437], 99.5000th=[ 5421], 99.9000th=[ 7590],
      | 99.9500th=[ 9518], 99.9900th=[32289]

QD=2, Batch=1
Maximum IOPS=483K
1533: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 3714],  5.0000th=[ 3755], 10.0000th=[ 3795],
      | 20.0000th=[ 3795], 30.0000th=[ 3835], 40.0000th=[ 3955],
      | 50.0000th=[ 4036], 60.0000th=[ 4076], 70.0000th=[ 4076],
      | 80.0000th=[ 4076], 90.0000th=[ 4116], 95.0000th=[ 4156],
      | 99.0000th=[ 4518], 99.5000th=[ 6144], 99.9000th=[ 7510],
      | 99.9500th=[ 9839], 99.9900th=[32289]

QD=4, Batch=1
Maximum IOPS=907K
1583: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 3393],  5.0000th=[ 3514], 10.0000th=[ 3594],
      | 20.0000th=[ 3634], 30.0000th=[ 3795], 40.0000th=[ 3875],
      | 50.0000th=[ 3955], 60.0000th=[ 4076], 70.0000th=[ 4156],
      | 80.0000th=[ 4277], 90.0000th=[ 4397], 95.0000th=[ 4477],
      | 99.0000th=[ 5120], 99.5000th=[ 5903], 99.9000th=[ 9357],
      | 99.9500th=[11004], 99.9900th=[32289]

QD=8, Batch=2
Maximum IOPS=1688K
1631: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 3353],  5.0000th=[ 3554], 10.0000th=[ 3634],
      | 20.0000th=[ 3755], 30.0000th=[ 3875], 40.0000th=[ 4036],
      | 50.0000th=[ 4156], 60.0000th=[ 4277], 70.0000th=[ 4437],
      | 80.0000th=[ 4678], 90.0000th=[ 4839], 95.0000th=[ 5040],
      | 99.0000th=[ 6305], 99.5000th=[ 7028], 99.9000th=[10080],
      | 99.9500th=[15502], 99.9900th=[32932]

QD=16, Batch=4
Maximum IOPS=2613K
1680: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 3955],  5.0000th=[ 4397], 10.0000th=[ 4558],
      | 20.0000th=[ 4759], 30.0000th=[ 4959], 40.0000th=[ 5120],
      | 50.0000th=[ 5261], 60.0000th=[ 5502], 70.0000th=[ 5743],
      | 80.0000th=[ 5903], 90.0000th=[ 6305], 95.0000th=[ 6706],
      | 99.0000th=[ 8393], 99.5000th=[ 8955], 99.9000th=[11325],
      | 99.9500th=[31968], 99.9900th=[34217]

QD=32, Batch=8
Maximum IOPS=3573K
1706: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 4919],  5.0000th=[ 5662], 10.0000th=[ 5903],
      | 20.0000th=[ 6144], 30.0000th=[ 6465], 40.0000th=[ 6626],
      | 50.0000th=[ 6867], 60.0000th=[ 7188], 70.0000th=[ 7510],
      | 80.0000th=[ 7992], 90.0000th=[ 8714], 95.0000th=[ 9357],
      | 99.0000th=[11325], 99.5000th=[11967], 99.9000th=[16626],
      | 99.9500th=[34217], 99.9900th=[37108]

QD=64, Batch=16
Maximum IOPS=3953K
1735: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 6626],  5.0000th=[ 7188], 10.0000th=[ 7510],
      | 20.0000th=[ 7992], 30.0000th=[ 8393], 40.0000th=[ 9116],
      | 50.0000th=[10160], 60.0000th=[11164], 70.0000th=[11646],
      | 80.0000th=[12128], 90.0000th=[12931], 95.0000th=[13735],
      | 99.0000th=[15984], 99.5000th=[16787], 99.9000th=[34217],
      | 99.9500th=[38072], 99.9900th=[40964]


============


5.19-rc3 + for-5.20/io_uring + this series:

QD=1, Batch=1
Maximum IOPS=246K
909: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 3955],  5.0000th=[ 3996], 10.0000th=[ 3996],
      | 20.0000th=[ 3996], 30.0000th=[ 3996], 40.0000th=[ 3996],
      | 50.0000th=[ 3996], 60.0000th=[ 3996], 70.0000th=[ 4036],
      | 80.0000th=[ 4036], 90.0000th=[ 4076], 95.0000th=[ 4116],
      | 99.0000th=[ 4196], 99.5000th=[ 5341], 99.9000th=[ 7590],
      | 99.9500th=[ 9357], 99.9900th=[32289]

QD=2, Batch=1
Maximum IOPS=487K
932: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 3714],  5.0000th=[ 3755], 10.0000th=[ 3755],
      | 20.0000th=[ 3755], 30.0000th=[ 3795], 40.0000th=[ 3795],
      | 50.0000th=[ 3996], 60.0000th=[ 4036], 70.0000th=[ 4036],
      | 80.0000th=[ 4036], 90.0000th=[ 4076], 95.0000th=[ 4116],
      | 99.0000th=[ 4437], 99.5000th=[ 6224], 99.9000th=[ 7510],
      | 99.9500th=[ 9598], 99.9900th=[32289]

QD=4, Batch=1
aximum IOPS=921K
955: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 3393],  5.0000th=[ 3433], 10.0000th=[ 3514],
      | 20.0000th=[ 3594], 30.0000th=[ 3674], 40.0000th=[ 3795],
      | 50.0000th=[ 3875], 60.0000th=[ 3996], 70.0000th=[ 4036],
      | 80.0000th=[ 4156], 90.0000th=[ 4317], 95.0000th=[ 4678],
      | 99.0000th=[ 5120], 99.5000th=[ 5903], 99.9000th=[ 9116],
      | 99.9500th=[10522], 99.9900th=[32289]

QD=8, Batch=2
Maximum IOPS=1658K
981: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 3313],  5.0000th=[ 3514], 10.0000th=[ 3594],
      | 20.0000th=[ 3714], 30.0000th=[ 3835], 40.0000th=[ 3996],
      | 50.0000th=[ 4116], 60.0000th=[ 4196], 70.0000th=[ 4397],
      | 80.0000th=[ 4598], 90.0000th=[ 4718], 95.0000th=[ 4919],
      | 99.0000th=[ 6385], 99.5000th=[ 6947], 99.9000th=[10000],
      | 99.9500th=[15180], 99.9900th=[32932]

QD=16, Batch=4
Maximum IOPS=2749K
1010: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 3955],  5.0000th=[ 4437], 10.0000th=[ 4558],
      | 20.0000th=[ 4759], 30.0000th=[ 4959], 40.0000th=[ 5120],
      | 50.0000th=[ 5261], 60.0000th=[ 5502], 70.0000th=[ 5743],
      | 80.0000th=[ 5903], 90.0000th=[ 6224], 95.0000th=[ 6626],
      | 99.0000th=[ 8313], 99.5000th=[ 9036], 99.9000th=[11967],
      | 99.9500th=[32289], 99.9900th=[34217]

QD=32, Batch=8
Maximum IOPS=3583K
1050: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 4879],  5.0000th=[ 5582], 10.0000th=[ 5903],
      | 20.0000th=[ 6224], 30.0000th=[ 6465], 40.0000th=[ 6626],
      | 50.0000th=[ 6787], 60.0000th=[ 7028], 70.0000th=[ 7349],
      | 80.0000th=[ 7911], 90.0000th=[ 8634], 95.0000th=[ 9196],
      | 99.0000th=[11164], 99.5000th=[11967], 99.9000th=[16305],
      | 99.9500th=[34217], 99.9900th=[37108]

QD=64, Batch=16
Maximum IOPS=3959K
1081: Latency percentiles:
     percentiles (nsec):
      |  1.0000th=[ 6546],  5.0000th=[ 7108], 10.0000th=[ 7429],
      | 20.0000th=[ 7992], 30.0000th=[ 8313], 40.0000th=[ 8955],
      | 50.0000th=[10000], 60.0000th=[11004], 70.0000th=[11646],
      | 80.0000th=[12128], 90.0000th=[12931], 95.0000th=[13735],
      | 99.0000th=[15984], 99.5000th=[16787], 99.9000th=[33253],
      | 99.9500th=[38072], 99.9900th=[41446]





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux