[ ... ] > But - as far as I understood - the filesystem doesn't have to > wait for barriers to complete, but could continue issuing IO > requests happily. A barrier only means, any request prior to > that have to land before and any after it after it. > It doesn't mean that the barrier has to land immediately and > the filesystem has to wait for this. At least that always was > the whole point of barriers for me. If thats not the case I > misunderstood the purpose of barriers to the maximum extent > possible. Unfortunately that seems the case. The purpose of barriers is to guarantee that relevant data is known to be on persistent storage (kind of hardware 'fsync'). In effect write barrier means "tell me when relevant data is on persistent storage", or less precisely "flush/sync writes now and tell me when it is done". Properties as to ordering are just a side effect. That is, the application (file system in the case of metadata, user process in the case of data) knows that a barrier operation is complete, it knows that all data involved in the barrier operation are on persistent storage. In case of serially dependent transactions, applications do wait until the previous transaction is completed before starting the next one (e.g. creating potentially many files in the same directory, something that 'tar' does). "all data involved" is usually all previous writes, but in more sophisticated cases it can be just specific writes. When an applications at transaction end points (for a file system, metadata updates) issues a write barrier and then waits for its completion. If the host adapter/disk controllers don't have persistent storage, then completion (should) only happen when the data involved is actually on disk; if they do have it, then multiple barriers can be outstanding, if the host adapter/disk controller does support multiple outstanding operations (e.g. thanks to tagged queueing). The best case is when the IO subsystem supports all of these: * tagged queueing: multiple write barriers can be outstanding; * fine granule (specific writes, not all writes) barriers: just metadata writes need to be flushed to persistent storage, not any intervening data writes too; * the host adapter and/or disk controller have persistent caches: as long as those caches have space, barriers can complete immediately, without waiting a write to disk. It just happens that typical contemporary PC IO subsystems (at the hardware level, not the Linux level) have none of those features, except sometimes for NCQ which is a reduced form of TCQ, and apparently is not that useful. Write barriers are also useful without persistent caches, if there is proper tagged queueing and fine granularity. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html