On Monday May 28, nikita@xxxxxxxxxxxxx wrote: > Neil Brown writes: > > > > [...] > > > Thus the general sequence might be: > > > > a/ issue all "preceding writes". > > b/ issue the commit write with BIO_RW_BARRIER > > c/ wait for the commit to complete. > > If it was successful - done. > > If it failed other than with EOPNOTSUPP, abort > > else continue > > d/ wait for all 'preceding writes' to complete > > e/ call blkdev_issue_flush > > f/ issue commit write without BIO_RW_BARRIER > > g/ wait for commit write to complete > > if it failed, abort > > h/ call blkdev_issue > > DONE > > > > steps b and c can be left out if it is known that the device does not > > support barriers. The only way to discover this to try and see if it > > fails. > > > > I don't think any filesystem follows all these steps. > > It seems that steps b/ -- h/ are quite generic, and can be implemented > once in a generic code (with some synchronization mechanism like > wait-queue at d/). Yes and no. It depends on what you mean by "preceding write". If you implement this in the filesystem, the filesystem can wait only for those writes where it has an ordering dependency. If you implement it in common code, then you have to wait for all writes that were previously issued. e.g. If you have two different filesystems on two different partitions on the one device, why should writes in one filesystem wait for a barrier issued in the other filesystem. If you have a single filesystem with one thread doing lot of over-writes (no metadata changes) and the another doing lots of metadata changes (requiring journalling and barriers) why should the data write be held up by the metadata updates? So I'm not actually convinced that doing this is common code is the best approach. But it is the easiest. The common code should provide the barrier and flushing primitives, and the filesystem gets to use them however it likes. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html