Re: [git pull] device mapper changes for 5.9

Ignat Korchagin <ignat@xxxxxxxxxxxxxx> · Tue, 18 Aug 2020 22:12:40 +0100

Just to bring in some more context: the primary trigger that made us look into it was high p99 read latency on a random read workflow on modern-ish SATA SSD and NVME disks. That is, on average things looked fine, but some portions of requests, which required a small chunk of data to be fetched from the disk fast were stalled for an unreasonable amount of time.
Most modern IO intensive workflows probably have good provisions to deal with slow writes and usually when we write the data we care more about the average throughput, that is we have enough throughput to write all the incoming data to disk without losing it. On the contrary, there are many modern IO workflows which require small chunks of data to be fetched fast (distributed key-value stores, caching systems etc), thus the emphasis there is on latency of reads (vs throughput of writes). And this is where we think the synchronous behaviour provides most benefit.

Additionally if one cares about latency they will not use HDDs for the workflow and HDDs have much higher IO latency than CPU scheduling. Thus it does not make much sense to do any benchmarks on HDDs as the HDD latency will likely hide any improvement/degradation of the synchronous IO handling in dm-crypt.

But, even latency wise, in our testing on larger block sizes (>2M) the synchronous IO (read/writes) may show worse performance and without fully understanding why? we're probably not ready yet to recommend something as a default.

Regards,
Ignat

On Tue, Aug 18, 2020 at 9:40 PM John Dorminy <jdorminy@xxxxxxxxxx> wrote:
For what it's worth, I just ran two tests on a machine with dm-crypt

using the cipher_null:ecb cipher. Results are mixed; not offloading IO

submission can result in -27% to +23% change in throughput, in a

selection of three IO patterns HDDs and SSDs.

(Note that the IO submission thread also reorders IO to attempt to

submit it in sector order, so that is an additional difference between

the two modes -- it's not just "offload writes to another thread" vs

"don't offload writes".) The summary (for my FIO workloads focused on

parallelism) is that offloading is useful for high IO depth random

writes on SSDs, and for long sequential small writes on HDDs.

Offloading reduced throughput for immensely high IO depths on SSDs,

where I would guess lock contention is reducing effective IO depth to

the disk; and for low IO depths of sequential writes on HDDs, where I

would guess (as it would for a zoned device) preserving submission order

is better than attempting to reorder before submission.

Two test regimes: randwrite on 7xSamsung SSD 850 PRO 128G, somewhat

aged, behind a LSI MegaRAID card providing raid0. 6 processors

(Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz); 128G RAM; and seqwrite,

on a software raid0 (512k chunk size) of 4 HDDs on the same machine

specs. Scheduler 'none' for both. LSI card in writethrough cache mode.

All data in MB/s.

depth    jobs    bs    dflt    no_wq    %chg    raw disk

----------------randwrite, SSD--------------

128    1    4k    282    282    0    285

256    4    4k    251    183    -27    283

2048    4    4k    266    283    +6    284

1    4    1m    433    414    -4    403

----------------seqwrite, HDD---------------

128    1    4k    87    107    +23    86

256    4    4k    101    90     -11    91

2048    4    4k    273    233    -15    249

1    4    1m    144    146    +1    146

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel