On 02/28/2014 11:42 PM, Bob Bawn wrote: > I am trying to understand how IO ordering safety is enforced with on > path failover. This is new territory for me so forgive me if this is > obvious. Consider the sequence: > > 1. client writes(lba=0,val=x) on path A. > 2. multipath declares path A dead and retries write on path B > 3. retried write on path B completes successfully and client get ack'd > 4. client writes(lba=0,val=y) on path B. It also completes > successfully and is ack'd to client > 5. write from (1) completes and corrupts data > > It seems like multipath needs a guarantee at step 2 that the original > write won't complete after path A has been declared down. I thought it > would issue something like a LUN RESET on path B and that the response > to that reset would indicate that it is safe to proceed. This page > sort of supports that speculation: > http://scst.sourceforge.net/mc_s.html > No, that assumption is wrong. Strict ordering is only guaranteed for commands submitted from the HBA to the wire. Once it's in-flight there are _no_ guarantees about ordering. Eg in a FC Fabric there might be several paths to the same target, each of which might have a different latency. So I/O on one path might actually be faster than the other one. And with CNA's it's virtually impossible to guarantee any I/O ordering due to several hardware queues involved etc. Same goes for the linux block layer; the only _enforced_ ordering of sorts is done by I/O being sent from the page-cache, as each page can submit only one I/O at a time. But as soon as you're using O_DIRECT you don't have any ordering guarantees, either, and it's up to the application to ensure any ordering requirements. Which is also what all filesystems do; for any critical section they wait for the I/O result before continuing. So for failover any retries will be covered by the multipath layer, and only the final I/O result will be returned to the upper layers, rendering any multipath failover invisible. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel