Dave, > That's where arbitrary delays in the storage stack below XFS cause > problems - if the first FUA log write is delayed, the next log > buffer will get filled, issued and delayed, and when we run out of > log buffers (there are 8 maximum) the entire log subsystem will > stall, stopping *all* log commit operations until log buffer > IOs complete and become free again. i.e. it can stall modifications > across the entire filesystem while we wait for batch timeouts to > expire and issue and complete FUA requests. To me, this sounds like design failure in XFS log subsystem. Or just the limitation of metadata journal. > IMNSHO, REQ_FUA/REQ_FLUSH optimisations should be done at the > point where they are issued - any attempt to further optimise them > by adding delays down in the stack to aggregate FUA operations will > only increase latency of the operations that the issuer want to have > complete as fast as possible.... That lower layer stack attempts to optimize further can benefit any filesystems. So, your opinion is not always correct although it is always correct in error handling or memory management. I have proposed future plan of using persistent memory. I believe with this leap forward filesystems are free from doing such optimization relevant to write barriers. For more detail, please see my post. https://lkml.org/lkml/2013/10/4/186 However, I think I should leave option to disable the optimization in case the upper layer doesn't like it. Maybe, writeboost should disable deferring barriers if barrier_deadline_ms parameter is especially 0. Linux kernel's layered architecture is obviously not always perfect so there are similar cases in other boundaries such as O_DIRECT to bypass the page cache. Maybe, dm-thin and dm-cache should add such switch. Akira -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel