Re: persistent reservation behaviour with dm-multipath

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le mercredi 23 juillet 2008 à 16:28 -0400, Douglas Gilbert a écrit :
> Christophe Varoqui wrote:
> > The current dm-multipath behaviour is currently a potent data corrupter
> > on Persistant Reservation-based clusters sharing multipaths with the
> > queue_if_no_path feature on (Clariion, Storageworks, ...).
> > 
> > Consider the following scenario :
> > 
> > - Node A take a write-exclusive persistent reservation on LU
> > - Node B submits a write io to LU, which is a sda-sdb multipath
> > - B dm_multipath routes the wio to sda, the wio is failed, the path is
> > marked failed
> > - B dm_multipath routes the wio to sdb, the wio is failed, the last
> > path is marked failed
> > - B queues the wio because of the queue_if_no_path feature. Process
> > submitting the wio is stuck in D-state.
> > - A releases the reservation. Queued wios are unqueued, corrupting the
> > data on LU.
> > 
> > I suspect wio returning a "reservation conflict" status should never be
> > queued.
> > 
> > DM suspend/resume on the multipath devmap effectively flushes the queue,
> > but this solution leaves a window open for data corruption, between io
> > enqueue and user-space driven queue flush.
> > 
> > Is there work in progress to address this issue yet ? What's would be an
> > acceptable solution design (for example Mike Christie suggested in Aug
> > 2005 a scsi-to-blk error translation patch, which got nowhere) ?
> y of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> If memory serves, a SCSI command status of RESERVATION
> CONFLICT did not find its way back to the sg driver API
> (and/or the command was retried). Is that still the case?
> 
As far as I can tell, the scsi subsystem alone behaves as expected : a
wio on a reserved-by-other scsi device gets errored nicely : no retry, a
clean message indicating the wio error cause in the dmesg.

The device-mapper multipath target, on the other hand, can be configured
to queue ios errored by the scsi layer. Which is a desirable behaviour
when we know we face a transcient all-paths-down situation (like a LU
tresspass on a Clariion controller pair), but which is not so smart we
the io was errored due to a reservation conflict.

The problem here is how the scsi layer can instruct the multipath dm
driver not to queue an errored io. This is what Mike's patch tied to
address.

Regards,
cvaroqui

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux