Hello, I have a two node stretched-cluster where the situation is like the attached image (I hope it is possible to attach small files...) Both the nodes have multipath installed. On site 2, the node often gets abort commands (see below /var/log/messages) May 14 11:35:22 orastud2 clurgmgrd: [6961]: <notice> Getting status May 14 11:35:23 orastud2 last message repeated 7 times May 14 11:59:56 orastud2 kernel: qla2xxx 0000:08:00.0: scsi(0:0:6): Abort command issued -- 1 10710 2002. May 14 11:59:56 orastud2 kernel: sd 0:0:0:6: timing out command, waited 300s May 14 11:59:56 orastud2 multipathd: /sbin/mpath_prio_alua exitted with 5 May 14 11:59:56 orastud2 multipathd: error calling out /sbin/mpath_prio_alua 8:208 May 14 12:35:22 orastud2 clurgmgrd: [6961]: <notice> Getting status May 14 12:35:23 orastud2 last message repeated 7 times May 14 12:41:25 orastud2 kernel: qla2xxx 0000:08:00.0: scsi(0:1:8): Abort command issued -- 1 16ca3 2002. May 14 13:35:22 orastud2 clurgmgrd: [6961]: <notice> Getting status May 14 14:35:23 orastud2 last message repeated 8 times the messages above refer to the device that now (5 hours later) show this: mpath3 (3600507630efe0b0c0000000000000603) dm-6 IBM,1750500 [size=60G][features=1 queue_if_no_path][hwhandler=0][rw] \_ round-robin 0 [prio=0][active] \_ 1:0:1:6 sdam 66:96 [active][undef] \_ 0:0:1:6 sdan 66:112 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 1:0:0:6 sdm 8:192 [active][undef] \_ 0:0:0:6 sdn 8:208 [active][undef] but in general I have these kind of messages for severale devices.... not only this What is the meaning of the messages related to multipath: multipathd: /sbin/mpath_prio_alua exitted with 5 multipathd: error calling out /sbin/mpath_prio_alua 8:208 Any hint on debugging the lines of kind: qla2xxx 0000:08:00.0: scsi(0:1:8): Abort command issued -- 1 16ca3 2002. Could it be related with bb credits as the only impacted is node 2 that is one switch more than node 1 away from the storage? Next step, as the san configuration for the two servers is identical and they are two hp blades would be to swap them, and see if the problem swap too or not. But any further hint or debugging flag I can put in multipath or other components is welcome. OS are rh el 5.3 x86_64 and storage is IBM DS6800. Thanks, Gianluca
Attachment:
san.jpg
Description: JPEG image
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel