Moger, Babu wrote:
Hi Hannes,
I have tested the patch you had sent. Failover works fine.
But, we are seeing problems during the failback. It is causing continues mode-select thrashing(ping pong).
Reason for this is, handler does not know if the mode select is coming for failover or failback. Every mode select will cause movement of all the Luns. It does not matter if the LUNs are on preferred path or not. On next polling interval, multipathd will find some Luns are not on preferred path and will initiate another failback. This will result in continues ping pong. I have explained this with an EXAMPLE 1 below.
For failback to work properly, we have to have selective Lun level failover.
There is also one more Cluster scenario where we could get into thrashing with Controller Level failover. Please see the EXAMPLE 2 below.
We have been testing LUN level failover with device mapper for a while now. It is working well for us and only problem we have is slower failovers with big configurations(failover was taking about 12 minutes with 234 luns). LSI and IBM(Chandra) has been working on asynchronous behavior for the past 3-4 months. I have tested all the patches Chandra has posted and we have seen very good results(Failover takes only 1 minute with 234 luns).
Also, these patches give the opportunity for other handlers to move to asynchronous behavior if they wish to. We need your(and Linux community) help to review the patches and move forward on this issue.
Thanks
Babu Moger
LSI Corporation
Following are the two example where we could see mode-select thrashing..
EXAMPLE 1 (mode select thrashing with 2 Luns in single host)
=======================================================
Let's take a very simple example.
I have 2 Luns on my host. Host is seeing both the controllers with one path to each controller.
Lun 0 is owned by controller A and preferred owner is A.
Lun 1 is owned by controller B and preferred owned is B
Here is multipath -ll output..
mpath237 (3600a0b80000f519c0000cc8a48fc7d0b) dm-4 LSI,INF-01-00
[size=2.0G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][active]
\_ 1:0:0:0 sde 8:64 [active][ready] (controller A)
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:0 sdi 8:128 [active][ghost] (controller B)
mpath180 (3600a0b80000f519c0000cc9048fc7d7b) dm-5 LSI,INF-01-00
[size=2.0G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][active]
\_ 2:0:0:1 sdj 8:144 [active][ready] (controller B)
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:1 sdf 8:80 [active][ghost] (controller A)
1. Run I/O on both these Luns
2. Pull the cable connected to controller A.
3. Failover will happen and Lun 0 will move to controller B. Now both the Luns are on controller B.
4. Connect the cable back on controller A.
5. multipath tool will detect the physical Luns on controller A and run the priority test.
6. It will find that Lun 0 is not on preferred path and will initiate a failback.
Because it is a controller level failover it will move the Lun 1 also to controller A.
Now both the Luns are on controller A.
7. Multipath tool will come again and find Lun 1 not on preferred path and initiate failback.
This will both the Luns to controller B.
This will continue forever.
Hmm. Yes, correct.
After all, the patch I sent was meant to be a proof of concept, not a fully fledged
solution. (In fact, I'm quite surprised it worked so well :-)
What about modifying the LUN select code to switch all _visible_ LUNs (ie all LUNs which
are _not_ on the preferred path) in one go?
That way we wouldn't run into this issue.
EXAMPLE 2: (mode select thrashing in cluster setup)
============================================================================
Let's take two node cluster environment where luns are visible across multiple nodes, although any
given lun would only be accessible via one node at a time. If a cluster configuration were to get
into a state where one node only has visibility to one controller while another node only has
visibility to the alternate, a “thrashing” condition could happen. Take this example:
• 32 luns have been mapped from the storage to all nodes.
• Luns 0-15 are owned by the ‘A’ controller and being accessed by node #1; luns 16-31 are owned by ‘B’ and mapped to node #2.
• Node #1 only has access to the ‘A’ controller; node #2 only has access to the ‘B’ controller.
Let’s say Node #1 decides to access lun 16. Because it does not have visibility to the ‘B’ controller
it must issue a volume transfer request. With Controller failover solution the volume transfer request
would also move luns 17-31. If node #2 were accessing those luns they would receive ownership errors,
causing a volume transfer request to move them back. However, this also moves lun 16 from ‘A’ back to ‘B’,
causing node #1 to do the volume transfer request again…..etc.
Again, I think this can be solved by just moving the LUNS _not_ on the preferred path.
The difference between the existing solution would be to move all LUNs not on the preferred path in
on go, instead of moving the LUNs one by one.
Will see to draw up a patch.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@xxxxxxx +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel