On Wed, Dec 17, 2008 at 04:40:02PM -0500, Bill Davidsen wrote: > What really bothers me is that there's no obvious need for > barriers at the device level if the file system is just a bit > smarter and does it's own async io (like aio_*), because you can > track writes outstanding on a per-fd basis, so instead of stopping > the flow of data to the drive, you can just block a file > descriptor and wait for the count of outstanding i/o to drop to > zero. That provides the order semantics of barriers as far as I > can see, having tirelessly thought about it for ten minutes or so. Well, you've pretty much described the algorithm XFS uses in it's transaction system - it's entirely asynchronous - and it's been clear for many, many years that this model is broken when you have devices with volatile write caches and internal re-ordering. I/O completion on such devices does not guarantee data is safe on stable storage. If the device does not commit writes to stable storage in the same order they are signalled as complete (i.e. internal device re-ordering occurred after completion), then the device violates fundamental assumptions about I/O completion that the above model relies on. XFS uses barriers to guarantee that the devices don't lie about the completion order of critical I/O, not that the I/Os are on stable storage. The fact that this causes cache flushes to stable storage is result of the implementation of that guarantee of ordering. I'm sure the linux barrier implementation could be smarter and faster (for some hardware), but for an operation that is used to guarantee integrity I'll take conservative and safe over smart and fast any day of the week.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html