24.09.2018, 23:00, "Coly Li" <colyli@xxxxxxx>: > On 9/21/18 4:51 PM, Захаров Алексей wrote: >> Hi all, >> >> I've tested bcache on ubuntu 16.04 with hwe-edge(4.15) kernel with fio. >> While testing i found that fio with --sync=1 and libaio doesn't work as expected. >> >> Here is an example: >> ~# uname -r >> 4.15.0-34-generic >> >> Bcache is in writeback mode. >> ~# cat /sys/class/block/bcache11/bcache/cache_mode >> writethrough [writeback] writearound none >> >> First test, libaio and sync=0: >> fio --name=test --iodepth=1 --numjobs=1 --direct=1 --filename=/dev/bcache11 --filesize=1G --blocksize=4k --rw=randwrite --sync=0 --ioengine=libaio >> test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 >> fio-2.2.10 >> Starting 1 process >> Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/61234KB/0KB /s] [0/15.4K/0 iops] [eta 00m:00s] >> >> iostat -xt 1 result while testing: >> 09/19/18 21:52:26 >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.22 0.00 1.35 0.00 0.00 98.43 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >> sdv 0.00 0.00 0.00 1.00 0.00 4.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00 >> nvme0c33n1 0.00 0.00 2.00 0.00 8.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00 >> bcache11 0.00 0.00 0.00 15894.00 0.00 63576.00 8.00 3098732.18 0.03 0.00 0.03 0.03 52.40 >> >> Second test with libaio and sync=1: >> fio --name=test --iodepth=1 --numjobs=1 --direct=1 --filename=/dev/bcache11 --filesize=1G --blocksize=4k --rw=randwrite --sync=1 --ioengine=libaio >> test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 >> fio-2.2.10 >> Starting 1 process >> Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/88123KB/0KB /s] [0/22.3K/0 iops] [eta 00m:00s] >> >> iostat while testing: >> 09/19/18 21:54:17 >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.19 0.00 1.16 0.00 0.00 98.65 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >> sdv 0.00 0.00 0.00 1.00 0.00 4.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00 >> nvme0c33n1 0.00 0.00 2.00 0.00 8.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00 >> bcache11 0.00 0.00 0.00 22118.00 0.00 88472.00 8.00 1014565.97 0.04 0.00 0.04 0.01 16.40 >> >> Third test with fsync=1 and libaio: >> fio --name=test --iodepth=1 --numjobs=1 --direct=1 --filename=/dev/bcache11 --filesize=1G --blocksize=4k --rw=randwrite --fsync=1 --ioengine=libaio >> test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 >> fio-2.2.10 >> Starting 1 process >> Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/21280KB/0KB /s] [0/5320/0 iops] [eta 00m:00s] >> >> iostat: >> 09/19/18 21:56:52 >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.19 0.00 0.91 1.38 0.00 97.52 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >> sdv 0.00 0.00 0.00 5959.00 0.00 4.00 0.00 0.00 0.09 0.00 0.09 0.00 0.00 >> nvme0c33n1 0.00 0.00 2.00 0.00 8.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00 >> bcache11 0.00 0.00 0.00 11915.00 0.00 23832.00 4.00 1548362.98 0.06 0.00 0.06 0.02 23.20 >> >> Fourth test with sync=1 and posixaio: >> fio --name=test --iodepth=1 --numjobs=1 --direct=1 --filename=/dev/bcache11 --filesize=1G --blocksize=4k --rw=randwrite --sync=1 --ioengine=posixaio >> test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=posixaio, iodepth=1 >> fio-2.2.10 >> Starting 1 process >> Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/27080KB/0KB /s] [0/6770/0 iops] [eta 00m:00s] >> >> iostat: >> 09/19/18 21:59:50 >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.09 0.00 0.56 2.26 0.00 97.08 >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util >> sdv 0.00 0.00 0.00 6605.00 0.00 4.00 0.00 0.00 0.09 0.00 0.09 0.00 0.00 >> nvme0c33n1 0.00 1.00 2.00 3.00 8.00 12.50 8.20 0.00 0.00 0.00 0.00 0.00 0.00 >> bcache11 0.00 0.00 0.00 13208.00 0.00 26416.00 4.00 838177.72 0.07 0.00 0.07 0.01 11.60 >> >> Results of last two test are understandable: we see write request and flush request on caching device and flush request on backing device per one fio write request. Fio iops are about 6K, because of slow backing device. >> But results of first two tests look a bit wierd to me: test with sync=1 shows more iops then test with sync=0. And there're no flush requests, when sync=1. >> I've tried to figure out if fio opens file with O_SYNC by running it in strace: >> strace -e 'open' fio --name=test --iodepth=1 --numjobs=1 --direct=1 --filename=/dev/bcache11 --filesize=1G --blocksize=4k --rw=randwrite --sync=1 --ioengine=libaio >> And i found that it is ok: >> open("/dev/bcache11", O_RDWR|O_SYNC|O_DIRECT|O_NOATIME) = 3 >> >> This behaviour is not reproducable on 4.4 kernel, which is a default for ubuntu 16.04. >> Btw, avgqu-sz show too high values for bcache on 4.15, even if no operations are in progress. >> >> What could be the root of this behaviour? I thought that libaio might be the cause - it is not upgraded when 4.15 kernel is installed, but it's just a guess. >> >> I can add full fio results, or make any additional tests or provide any other info if it helps. > > Hi Aleksei, > > We don't specially treat sync requests, in writeback mode request with > REQ_SYNC still go into cache device. > > I cannot provide an answer with easy, but there are quite a lot of > changes happened since 4.15 to 4.18. Could you please to try the latest > stable kernel and check is there any difference ? I did some more tests, reinstalled kernel packages. And i can't reproduce previously seen behaviour (fortunately). As i can see, for 4.15 and 4.18 kernels: --sync=1 --ioengine=libaio: write + flush to cache device --fsync=1 --ioengine=libaio: write + flush to cache device flush to backing device for 4.4 kernel: --sync=1 --ioengine=libaio: write + flush to cache device flush to backing device --fsync=1 --ioengine=libaio: write + flush to cache device flush to backing device Please, correct me, if i'm wrong: Open file with O_SYNC flag causes REQ_SYNC flag to be set on every write io. Send of flush request causes REQ_PREFLUSH flag set on flush io. > > BTW, could you please tell me why you care the performance of single > thread and iodepth 0, I almost not test performance for such configuration. I've been benchmarking bcache with nvme drives and i increased numjobs gradually. Increasing "iodepth" settings leads to 100% cpu usage by 1 fio process. It was ~50-60% cpu utilization with iodepth=1 and 100% with iodepth=4 on raw nvme device. pidstat fio showed ~80%sys and ~20%usr. So, i've decided to set iodepth=1 and increase numjobs. > > Thanks. > > Coly Li > >> -- >> Regards, >> Aleksei Zakharov -- Regards, Aleksei Zakharov