On Monday January 3, andy@xxxxxxxxxxxxxx wrote: > > I have no idea which of you to believe now. :( Well, how about I wade in..... (almost*) No block storage device will guarantee that write ordering is maintained. Neither will read requests necessarily be ordered. Any SCSI, IDE, or similar disc drive in Linux (or any other non-toy OS) will have requests managed by an "elevator algorithm" which coalesces adjacent blocks and tries to re-order requests to make optimal use of the device. A RAID controller, whether software, firmware, or hardware, will also re-order requests to make best use of the devices. Any filesystem that assumes that requests will not be re-ordered is broken, as the assumption is wrong. I would be *very* surprised if Reiserfs makes this assumption. Until relatively recently, the only assumption that could be made is that a write request will be handled sometime between when it is made, and when the request completes (i.e. the end_io callback is called). If several requests are concurrent they could commit in any order. With only this guarantee, the simplest approach for a journalling filesystem is to write the content of a journal entry, wait for the writes to complete, and then write a single block "header" which describes and hence commits that journal entry. The journal entry is not "safe" until this second write completes. This is equally applicable for IDE drives, SCSI drives, software RAID1, software RAID5, hardware RAID etc. More recently (2.6 only) Linux has had support for "write barriers". The idea here is that you submit a number of write requests, then a "barrier", then some more write requests. (The "barrier" might be a flag on the last request of a list, I'm not sure of that detail). The meaning is that no write request submitted after the barrier will be attempted until all requests submitted before the barrier are complete. Some drives support this concept natively so Linux simply does not re-order requests across a barrier, and sends the barrier at the appropriate time. Drives can do their own re-ordering but will not reorder across a barrier (if they support the barrier concept). If Linux needs to write a barrier to a device that doesn't support barriers (as the md/raid currently doesn't) it will (should) submit all requests before the barrier, flush them out, wait for them to complete, then allow other requests to be forwarded. In short, md/raid provides the same guarantees as normal drives, and any filesystem that expects more is broken. Definitely put your journal on RAID with at least as much redundancy as your main filesystem (I put my filesystem on raid5 and my journal on raid1). NeilBrown * I happen to know that the "umem" NVRAM driver will never re-order requests, as there is no value in re-ordering requests to RAM. But it is the exception, not the rule. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html