Thank you! I'm glad that we've established it's a mismatch between our device's implementation and XFS expectations. >.... XFS issues log writes with REQ_PREFLUSH|REQ_FUA. This means sequentially issued log writes have clearly specified ordering constraints. i.e. the preflush completion order requirements means that the block device must commit preflush+write+fua bios to stable storage in the exact order they were issued by the filesystem.... That is certainly what REQ_BARRIER did back in the day. But when REQ_BARRIER was replaced with separate REQ_FUA and REQ_FLUSH flags, and barrier.txt got replaced with writeback_cache_control.txt, the documentation seemed to imply the ordering requirement on *issued* IO had gone away (but maybe I'm missing something). Quoth writeback_cache_control.txt about REQ_PREFLUSH: > will make sure the volatile cache of the storage device >has been flushed before the actual I/O operation is started. > This explicitly guarantees that previously completed write requests are on non-volatile > storage before the flagged bio starts. And about REQ_FUA: > I/O completion for this request is only > signaled after the data has been committed to non-volatile storage. I am perhaps overlooking where REQ_PREFLUSH guarantees all previously issued write requests with FLUSH|FUA are stable, not just all previously completed ones. Is this documented somewhere? Nevertheless, if XFS is expecting this guarantee, that would certainly be the source of this corruption. Thanks again! On Mon, Jun 12, 2017 at 7:50 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Fri, Jun 09, 2017 at 10:06:26PM -0400, Sweet Tea Dorminy wrote: >> >What is the xfs_info for this filesystem? >> meta-data=/dev/mapper/tracer-vdo0 isize=256 agcount=4, >> agsize=5242880 blks >> = sectsz=512 attr=2, projid32bit=0 >> data = bsize=1024 blocks=20971520, >> imaxpct=25 >> = sunit=0 swidth=0 blks >> naming =version 2 bsize=4096 ascii-ci=0 >> log =internal bsize=1024 blocks=10240, version=2 >> = sectsz=512 sunit=0 blks, >> lazy-count=1 >> realtime =none extsz=4096 blocks=0, rtextents=0 >> >> > What granularity are these A and B regions (sectors or larger)? >> A is 1k, B is 3k. >> >> >Are you running on some kind of special block device that reproduces this? >> It's a device we are developing, >> asynchronous, which we believe obeys FLUSH and FUA correctly but may >> have missed some case; > > So Occam's Razor applies here.... > >> we >> encountered this issue when testing an XFS filesystem on it, and other >> filesystems appear to work fine (although obviously we could have >> merely gotten lucky). > > XFS has quite sophisticated async IO dispatch and ordering > mechanisms compared to other filesystems and so frequently exposes > problems in the underlying storage layers that other filesystems > don't exercise. > >> Currently, when a flush returns from the device, >> we guarantee the data from all bios completed before the flush was >> issued is stably on disk; > > Yup, that's according to > Documentation/block/writeback_cache_control.txt, however.... > >> when a write+FUA bio returns from the >> device, the data in that bio (only) is guaranteed to be stable on disk. The >> device may, however, commit sequentially issued write+fua bios to disk in an >> arbitrary order. > > .... XFS issues log writes with REQ_PREFLUSH|REQ_FUA. This means > sequentially issued log writes have clearly specified ordering > constraints. i.e. the preflush completion order requirements means > that the block device must commit preflush+write+fua bios to stable > storage in the exact order they were issued by the filesystem.... > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html