Re: RFC: multipath IO multiplex

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Wouldn't it practical to bypass mpio completely on submit your io to the paths instead ?

Cheers,
cvaroqui

----- Message d'origine -----
> On 2010-11-06T11:51:02, Alasdair G Kergon <agk@xxxxxxxxxx> wrote:
>
> Hi Neil, Alasdair,
>
> thanks for the feedback. Answering your points in reverse order -
>
> > > Might it make sense to configure a range of the device where writes
> > > always went down all paths?  That would seem to fit with your
> > > problem description and might be easiest??
> > Indeed - a persistent property of the device (even another interface
> > with a different minor number) not the I/O.
>
> I'm not so sure that would be required though. The equivalent of our
> "mkfs" tool wouldn't need this. Also, typically, this would be a
> partition (kpartx) on top of a regular MPIO mapping (that we want to be
> managed by multipathd).
>
> Handling this completely differently would complicate setup, no?
>
> > And what is the nature of the data being written, given that I/O to
> > one path might get delayed and arrive long after it was sent,
> > overwriting data sent later.  Successful stale writes will always be
> > recognised as such by readers - how?
>
> The very particular use case I am thinking of is the "poison pill" for
> node-level fencing. Nodes constantly monitor their slot (using direct
> IO, bypassing all caching, etc), and either can successfully read it or
> commit suicide (assisted by a hardware watchdog to protect against
> stalls).
>
> The writer knows that, once the message has been successfully written,
> the target node will either have read it (and committed suicide), or
> been self-fenced because of a timeout/read error.
>
> Allowing for the additional timeouts incurred by MPIO here really slows
> this mechanism down to the point of being unusable.
>
> Now, even if a write was delayed - which is not very likely, it's more
> likely that some of the IO will just fail if indeed one of the paths
> happens to go down, and this would not resubmit it to other paths -, the
> worst that could happen would be a double fence. (If it gets written
> after the node has cycled once and cleared its message slot; that would
> imply a significant delay already, since servers take a bit to boot.)
>
> For the 'heartbeat' mechanism and others (if/when we get around for
> adding them), we could ignore the exact contents that have been written
> and just watch for changes; worst, the node death detection will take a
> bit longer.
>
> Basically, the thing we need to get around is the possible IO latency in
> MPIO, for things like poison pill fencing ("storage-based death") or
> qdisk-style plugins. I'm open for other suggestions as well.
>
>
>
> Regards,
>        Lars
>
> --
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG NÃrnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux