On Tue, 24 May 2022, Keith Busch wrote: > On Tue, May 24, 2022 at 01:14:18PM -0700, Eric Wheeler wrote: > > Hi Christoph, > > > > On Mon, 23 May 2022, Christoph Hellwig wrote: > > > ... wait. > > > > > > Can someone explain what this is all about? Devices with power fail > > > protection will advertise that (using VWC flag in NVMe for example) and > > > we will never send flushes. So anything that explicitly disables flushed > > > will generally cause data corruption. > > > > Adriano was getting 1.5ms sync-write ioping's to an NVMe through bcache > > (instead of the expected ~70us), so perhaps the NVMe flushes were killing > > performance if every write was also forcing an erase cycle. > > > > The suggestion was to disable flushes in bcache as a troubleshooting step > > to see if that solved the problem, but with the warning that it could be > > unsafe. > > > > Questions: > > > > 1. If a user knows their disks have a non-volatile cache then is it safe > > to drop flushes? > > > > 2. If not, then under what circumstances is it unsafe with a non-volatile > > cache? > > > > 3. Since the block layer wont send flushes when the hardware reports that > > the cache is non-volatile, then how do you query the device to make > > sure it is reporting correctly? For NVMe you can get VWC as: > > nvme id-ctrl -H /dev/nvme0 |grep -A1 vwc > > > > ...but how do you query a block device (like a RAID LUN) to make sure > > it is reporting a non-volatile cache correctly? > > You can check the queue attribute, /sys/block/<disk>/queue/write_cache. If the > value is "write through", then the device is reporting it doesn't have a > volatile cache. If it is "write back", then it has a volatile cache. Thanks, Keith! Is this flag influced at all when /sys/block/sdX/queue/scheduler is set to "none", or does the write_cache flag operate independently of the selected scheduler? Does the block layer stop sending flushes at the first device in the stack that is set to "write back"? For example, if a device mapper target is writeback will it strip flushes on the way to the backing device? This confirms what I have suspected all along: We have an LSI MegaRAID SAS-3516 where the write policy is "write back" in the LUN, but the cache is flagged in Linux as write-through: ]# cat /sys/block/sdb/queue/write_cache write through I guess this is the correct place to adjust that behavior! -- Eric Wheeler