Re: [PATCH 00/16] block optimisation round

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/19/21 22:24, Pavel Begunkov wrote:
Jens tried out a similar series with some not yet sent additions:
8.2-8.3 MIOPS -> ~9 MIOPS, or 8-10%.

12/16 is bulky, but it nicely drives the numbers. Moreover, with
it we can rid of some not used anymore optimisations in
__blkdev_direct_IO() because it awlays serve multiple bios.
E.g. no need in conditional referencing with DIO_MULTI_BIO,
and _probably_ can be converted to chained bio.
Some numbers, using nullblk is not perfect, but empirically
from numbers Jens posts his Optane setup usually gives somewhat
relatable results in terms of % difference. (probably, divide
the difference in percents by 2 for the worst case).

modprobe null_blk no_sched=1 irqmode=1 completion_nsec=0 submit_queues=16 poll_queues=32
echo 0 > /sys/block/nullb0/queue/iostats
echo 2 > /sys/block/nullb0/queue/nomerges
nice -n -20 taskset -c 0 ./io_uring -d32 -s32 -c32 -p1 -B1 -F1 -b512 /dev/nullb0
# polled=1, fixedbufs=1, register_files=1, buffered=0 QD=32, sq_ring=32, cq_ring=64

# baseline (for-5.16/block)

IOPS=4304768, IOS/call=32/32, inflight=32 (32)
IOPS=4289824, IOS/call=32/32, inflight=32 (32)
IOPS=4227808, IOS/call=32/32, inflight=32 (32)
IOPS=4187008, IOS/call=32/32, inflight=32 (32)
IOPS=4196992, IOS/call=32/32, inflight=32 (32)
IOPS=4208384, IOS/call=32/32, inflight=32 (32)
IOPS=4233888, IOS/call=32/32, inflight=32 (32)
IOPS=4266432, IOS/call=32/32, inflight=32 (32)
IOPS=4232352, IOS/call=32/32, inflight=32 (32)

# + patch 14/16 (skip advance)

IOPS=4367424, IOS/call=32/32, inflight=0 (16)
IOPS=4401088, IOS/call=32/32, inflight=32 (32)
IOPS=4400544, IOS/call=32/32, inflight=0 (29)
IOPS=4400768, IOS/call=32/32, inflight=32 (32)
IOPS=4409568, IOS/call=32/32, inflight=32 (32)
IOPS=4373888, IOS/call=32/32, inflight=32 (32)
IOPS=4392544, IOS/call=32/32, inflight=32 (32)
IOPS=4368192, IOS/call=32/32, inflight=32 (32)
IOPS=4362976, IOS/call=32/32, inflight=32 (32)

Comparing profiling. Before:
+    1.75%  io_uring  [kernel.vmlinux]  [k] bio_iov_iter_get_pages
+    0.90%  io_uring  [kernel.vmlinux]  [k] iov_iter_advance

After:
+    0.91%  io_uring  [kernel.vmlinux]  [k] bio_iov_iter_get_pages_hint
[no iov_iter_advance]

# + patches 15,16 (switch optimisation)

IOPS=4485984, IOS/call=32/32, inflight=32 (32)
IOPS=4500384, IOS/call=32/32, inflight=32 (32)
IOPS=4524512, IOS/call=32/32, inflight=32 (32)
IOPS=4507424, IOS/call=32/32, inflight=32 (32)
IOPS=4497216, IOS/call=32/32, inflight=32 (32)
IOPS=4496832, IOS/call=32/32, inflight=32 (32)
IOPS=4505632, IOS/call=32/32, inflight=32 (32)
IOPS=4476224, IOS/call=32/32, inflight=32 (32)
IOPS=4478592, IOS/call=32/32, inflight=32 (32)
IOPS=4480128, IOS/call=32/32, inflight=32 (32)
IOPS=4468640, IOS/call=32/32, inflight=32 (32)

Before:
+    1.92%  io_uring  [kernel.vmlinux]  [k] submit_bio_checks
+    5.56%  io_uring  [kernel.vmlinux]  [k] blk_mq_submit_bio
After:
+    1.66%  io_uring  [kernel.vmlinux]  [k] submit_bio_checks
+    5.49%  io_uring  [kernel.vmlinux]  [k] blk_mq_submit_bio

0.3% difference from perf, ~2% from absolute numbers, which is
most probably just a coincidence. But 0.3% looks realistic.


--
Pavel Begunkov



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux