On 05/04/2024 06.45, Damien Le Moal wrote: > Performance evaluation results > ============================== > > Environments: > - Intel Xeon 16-cores/32-threads, 128GB of RAM > - Kernel: > - ZWL (baseline): block/for-next (based on 6.9.0-rc2) > - ZWP: block/for-next patched kernel to add zone write plugging > (both kernels were compiled with the same configuration turning > off most heavy debug features) > > Workoads: > - seqw4K1: 4KB sequential write, qd=1 > - seqw4K16: 4KB sequential write, qd=16 > - seqw1M16: 1MB sequential write, qd=16 > - rndw4K16: 4KB random write, qd=16 > - rndw128K16: 128KB random write, qd=16 > - btrfs workoad: Single fio job writing 128 MB files using 128 KB > direct IOs at qd=16. > > Devices: > - nullblk (zoned): 4096 zones of 256 MB, 128 max open zones. > - NVMe ZNS drive: 1 TB ZNS drive with 2GB zone size, 14 max open and > active zones. > - SMR HDD: 20 TB disk with 256MB zone size, 128 max open zones. > > For ZWP, the result show the performance percentage increase (or > decrease) against ZWL (baseline) case. > > 1) null_blk zoned device: > > +--------+--------+-------+--------+---------+---------+ > |seqw4K1 |seqw4K16|seqw1M1|seqw1M16|rndw4K16|rndw128K16| > |(MB/s) | (MB/s) |(MB/s) | (MB/s) | (KIOPS)| (KIOPS) | > +-----------+--------+--------+-------+--------+--------+----------+ > | ZWL | 940 | 840 | 18550 | 14400 | 424 | 167 | > |mq-deadline| | | | | | | > +-----------+--------+--------+-------+--------+--------+----------+ > | ZWP | 943 | 845 | 18660 | 14770 | 461 | 165 | > |mq-deadline| (+0%) | (+0%) | (+0%) | (+1%) | (+8%) | (-1%) | > +-----------+--------+--------+-------+--------+--------+----------+ > | ZWP | 756 | 668 | 16020 | 12980 | 135 | 101 | > | bfq | (-19%) | (-20%) | (-13%)| (-9%) | (-68%) | (-39%) | > +-----------+--------+--------+-------+--------+--------+----------+ > | ZWP | 2639 | 1715 | 28190 | 19760 | 344 | 150 | > | none | (+180%)| (+104%)| (+51%)| (+37%) | (-18%) | (-10%) | > +-----------+--------+--------+-------+--------+--------+----------+ > > ZWP with mq-deadline gives performance very similar to zone write > locking, showing that zone write plugging overhead is acceptable. > But ZWP ability to run fast block devices with the none scheduler > shows brings all the benefits of zone write plugging and results in > significant performance increase for all workloads. The exception to > this are random write workloads with multiple jobs: for these, the > faster request submission rate achieved by zone write plugging results > in higher contention on null-blk zone spinlock, which degrades > performance. > > 2) NVMe ZNS drive: > > +--------+--------+-------+--------+--------+----------+ > |seqw4K1 |seqw4K16|seqw1M1|seqw1M16|rndw4K16|rndw128K16| > |(MB/s) | (MB/s) |(MB/s) | (MB/s) | (KIOPS)| (KIOPS) | > +-----------+--------+--------+-------+--------+--------+----------+ > | ZWL | 183 | 702 | 1066 | 1103 | 53.5 | 14.5 | > |mq-deadline| | | | | | | > +-----------+--------+--------+-------+--------+--------+----------+ > | ZWP | 183 | 719 | 1086 | 1108 | 55.6 | 14.7 | > |mq-deadline| (-0%) | (+1%) | (+0%) | (+0%) | (+3%) | (+1%) | > +-----------+--------+--------+-------+--------+--------+----------+ > | ZWP | 178 | 691 | 1082 | 1106 | 30.8 | 11.5 | > | bfq | (-3%) | (-2%) | (-0%) | (+0%) | (-42%) | (-20%) | > +-----------+--------+--------+-------+--------+--------+----------+ > | ZWP | 190 | 666 | 1083 | 1108 | 51.4 | 14.7 | > | none | (+4%) | (-5%) | (+0%) | (+0%) | (-4%) | (+0%) | > +-----------+--------+--------+-------+--------+--------+----------+ > > Zone write plugging overhead does not significantly impact performance. > Similar to nullblk, using the none scheduler leads to performance > increase for most workloads. > > 3) SMR SATA HDD: > > +-------+--------+-------+--------+--------+----------+ > |seqw4K1|seqw4K16|seqw1M1|seqw1M16|rndw4K16|rndw128K16| > |(MB/s) | (MB/s) |(MB/s) | (MB/s) | (KIOPS)| (KIOPS) | > +-----------+-------+--------+-------+--------+--------+----------+ > | ZWL | 107 | 243 | 245 | 246 | 2.2 | 0.763 | > |mq-deadline| | | | | | | > +-----------+-------+--------+-------+--------+--------+----------+ > | ZWP | 107 | 242 | 245 | 245 | 2.2 | 0.772 | > |mq-deadline| (+0%) | (-0%) | (+0%) | (-0%) | (+0%) | (+0%) | > +-----------+-------+--------+-------+--------+--------+----------+ > | ZWP | 104 | 241 | 246 | 242 | 2.2 | 0.765 | > | bfq | (-2%) | (-0%) | (+0%) | (-0%) | (+0%) | (+0%) | > +-----------+-------+--------+-------+--------+--------+----------+ > | ZWP | 115 | 235 | 249 | 242 | 2.2 | 0.763 | > | none | (+7%) | (-3%) | (+1%) | (-1%) | (+0%) | (+0%) | > +-----------+-------+--------+-------+--------+--------+----------+ > > Performance with purely sequential write workloads at high queue depth > somewhat decrease a little when using zone write plugging. This is due > to the different IO pattern that ZWP generates where the first writes to > a zone start being issued when the end of the previous zone are still > being written. Depending on how the disk handles queued commands, seek > may be generated, slightly impacting the throughput achieved. Such pure > sequential write workloads are however rare with SMR drives. > > 4) Zone append tests using btrfs: > > +-------------+-------------+-----------+-------------+ > | null-blk | null_blk | ZNS | SMR | > | native ZA | emulated ZA | native ZA | emulated ZA | > | (MB/s) | (MB/s) | (MB/s) | (MB/s) | > +-----------+-------------+-------------+-----------+-------------+ > | ZWL | 2441 | N/A | 1081 | 243 | > |mq-deadline| | | | | > +-----------+-------------+-------------+-----------+-------------+ > | ZWP | 2361 | 2999 | 1085 | 239 | > |mq-deadline| (-1%) | | (+0%) | (-2%) | > +-----------+-------------+-------------+-----------+-------------+ > | ZWP | 2299 | 2730 | 1080 | 240 | > | bfq | (-4%) | | (+0%) | (-2%) | > +-----------+-------------+-------------+-----------+-------------+ > | ZWP | 2443 | 3152 | 1083 | 240 | > | none | (+0%) | | (+0%) | (-1%) | > +-----------+-------------+-------------+-----------+-------------+ > > With a more realistic use of the device though a file system, ZWP does > not introduce significant performance differences, except for SMR for > the same reason as with the fio sequential workloads at high queue > depth. > I ran some fio performance tests across multiple different NVMe ZNS devices on my bare metal setup with this patch set. In my tests I ran seqw, seqr and rndr with a range of block sizes and varying concurrent jobs for both the none and mq-deadline scheduler. The results are consistent with the ones you posted here. Performance improvements are most noticeable for rndr workloads. Looks great! Dennis Tested-by: Dennis Maisenbacher <dennis.maisenbacher@xxxxxxx>