On Tue, 24 May 2022, Christoph Hellwig wrote: > On Tue, May 24, 2022 at 02:34:23PM -0700, Eric Wheeler wrote: > > Is this flag influced at all when /sys/block/sdX/queue/scheduler is set > > to "none", or does the write_cache flag operate independently of the > > selected scheduler? > > This in completely independent from sthe scheduler. > > > Does the block layer stop sending flushes at the first device in the stack > > that is set to "write back"? For example, if a device mapper target is > > writeback will it strip flushes on the way to the backing device? > > This is up to the stacking driver. dm and tend to pass through flushes > where needed. > > > This confirms what I have suspected all along: We have an LSI MegaRAID > > SAS-3516 where the write policy is "write back" in the LUN, but the cache > > is flagged in Linux as write-through: > > > > ]# cat /sys/block/sdb/queue/write_cache > > write through Hi Keith, Christoph: Adriano who started this thread (cc'ed) reported that setting queue/write_cache to "write back" provides much higher latency on his NVMe than "write through"; I tested a system here and found the same thing. Here is Adriano's summary: # cat /sys/block/nvme0n1/queue/write_cache write through # ioping -c10 /dev/nvme0n1 -D -Y -WWW -s4K ... min/avg/max/mdev = 60.0 us / 78.7 us / 91.2 us / 8.20 us ^^^^ ^^ # for i in /sys/block/*/queue/write_cache; do echo 'write back' > $i; done # ioping -c10 /dev/nvme0n1 -D -Y -WWW -s4K ... min/avg/max/mdev = 1.81 ms / 1.89 ms / 2.01 ms / 82.3 us ^^^^ ^^ Interestingly, Adriano's is 24.01x and ours is 23.97x higher latency higher (see below). These 24x numbers seem too similar to be a coincidence on such different configurations. He's running Linux 5.4 and we are on 4.19. Is this expected? More info: The stack where I verified the behavior Adriano reported is slightly different, NVMe's are under md RAID1 with LVM on top, so latency is higher, but still basically the same high latency difference with writeback enabled: ]# cat /sys/block/nvme[01]n1/queue/write_cache write through write through ]# ionice -c1 -n1 ioping -c10 /dev/ssd/ssd-test -D -s4k -WWW -Y ... min/avg/max/mdev = 119.1 us / 754.9 us / 2.67 ms / 1.02 ms ]# cat /sys/block/nvme[01]n1/queue/write_cache write back write back ]# ionice -c1 -n1 ioping -c10 /dev/ssd/ssd-test -D -s4k -WWW -Y ... min/avg/max/mdev = 113.4 us / 18.1 ms / 29.2 ms / 9.53 ms -- Eric Wheeler