Unstable isci path and confusing logging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, 25 Jun 2009, Charlie Brady wrote:

multipathd logs a path checker message whenever state changes. But it also logs every non-changed message when verbosity is >=4 and newstate is PATH_UP or PATH_GHOST, and every non-changed message when verbosity is >= 2 and newstate is PATH_DOWN.

I believe that the message should only be logged, once, when state changes.

I have a mystery related to state change logging which I hope somebody can help me solve.

I am running multipathd with the following patch applied (as well as the patch from https://www.redhat.com/archives/dm-devel/2009-June/msg00293.html which updates that checker message with each poll):

Signed-off-by: Charlie Brady <charlieb@xxxxxxxxxxxxxxxxxx>

--- multipath-tools-0.4.7.rhel5.13/multipathd/main.c 2009-06-25 13:32:57.248455000 -0400 +++ mezzanine_patched_multipath-tools-0.4.7.rhel5.13/multipathd/main.c 2009-06-25 13:32:12.968639000 -0400
@@ -1041,7 +1041,6 @@
			 		enable_group(pp);
			 }
			 else if (newstate == PATH_UP || newstate ==
			 PATH_GHOST) {
-				LOG_MSG(4, checker_message(&pp->checker));
				 /*
				  * double the next check delay.
				  * max at conf->max_checkint
@@ -1055,8 +1054,6 @@
				 condlog(4, "%s: delay next check %is",
			 			pp->dev_t, pp->tick);
			}
-			else if (newstate == PATH_DOWN)
-				LOG_MSG(2, checker_message(&pp->checker));

			pp->state = newstate;

Therefore, the only logging of checker_message() remaining is this:

...
                        if (newstate != pp->state) {
                                int oldstate = pp->state;
                                pp->state = newstate;
                                LOG_MSG(1, checker_message(&pp->checker));
...

What I am seeing however is this:

[root@sun4150node2 ~]# grep 'rdac checker' /var/log/messages | tail -10
Jul  3 12:29:44 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:29:49 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:29:54 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:29:59 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:04 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:09 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:14 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:19 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:24 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:30:29 sun4150node2 multipathd: sdc: rdac checker reports path is up
[root@sun4150node2 ~]#

So I am seeing repeated logging of 'up' state messages, although I only expect to see one message per state change.

As further background, I have four iscsi paths configured to two ports on two controllers on a Sun StorageTek 2510 (a bug in the Sun CAM management software has also configured a non-working IPV6 session).

[root@sun4150node2 ~]# iscsiadm -m node -P 1
Target: iqn.1986-03.com.sun:2510.600a0b800049afbd000000004a36468f
        Portal: 10.37.13.101:3260,1
                Iface Name: default
        Portal: [fe80:0000:0000:0000:02a0:b8ff:fe4a:1c6d]:3260,1
                Iface Name: default
        Portal: 10.37.13.104:3260,2
                Iface Name: default
        Portal: 10.37.13.103:3260,2
                Iface Name: default
        Portal: 10.37.13.102:3260,1
                Iface Name: default
[root@sun4150node2 ~]#

Two of the 2510's port connections are disconnected (10.37.13.103 and 10.37.13.104).

The path checker apparently finds the 8:32 path as intermittent:

[root@sun4150node2 ~]# echo show paths | multipathd -k
multipathd> hcil     dev dev_t pri dm_st   chk_st   next_check
10:0:0:0 sdd 8:48  100 [failed][faulty] XXXXXXXX.. 4/5
11:0:0:0 sde 8:64  100 [failed][faulty] XXXXXXXX.. 4/5
9:0:0:0  sdb 8:16  0   [active][ready]  XXXXXX.... 12/20
12:0:0:0 sdc 8:32  0   [failed][faulty] XXXXXX.... 3/5
multipathd> [root@sun4150node2 ~]#
[root@sun4150node2 ~]#
[root@sun4150node2 ~]# echo show paths | multipathd -k
multipathd> hcil     dev dev_t pri dm_st   chk_st   next_check
10:0:0:0 sdd 8:48  100 [failed][faulty] XX........ 1/5
11:0:0:0 sde 8:64  100 [failed][faulty] XX........ 1/5
9:0:0:0  sdb 8:16  0   [active][ready]  XX........ 4/20
12:0:0:0 sdc 8:32  0   [active][ready]  XXXXXXXXXX 5/5
multipathd> [root@sun4150node2 ~]# echo show paths | multipathd -k
multipathd> hcil     dev dev_t pri dm_st   chk_st   next_check
10:0:0:0 sdd 8:48  100 [failed][faulty] XXXXXX.... 3/5
11:0:0:0 sde 8:64  100 [failed][faulty] XXXXXX.... 3/5
9:0:0:0  sdb 8:16  0   [active][ready]  XXXXXXXX.. 16/20
12:0:0:0 sdc 8:32  0   [failed][faulty] XXXX...... 2/5
multipathd> [root@sun4150node2 ~]#

Here is a longer excerpt from syslog:

Jul  3 12:33:41 sun4150node2 iscsid: connect failed (113)
Jul 3 12:33:44 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:33:44 sun4150node2 multipathd: 8:32: reinstated
Jul  3 12:33:44 sun4150node2 multipathd: mpath0: remaining active paths: 2
Jul  3 12:33:44 sun4150node2 multipathd: mpath0: switch to path group #1
Jul  3 12:33:44 sun4150node2 multipathd: mpath0: switch to path group #1
Jul  3 12:33:44 sun4150node2 multipathd: dm-2: add map (uevent)
Jul  3 12:33:44 sun4150node2 multipathd: dm-2: devmap already registered
Jul  3 12:33:45 sun4150node2 kernel: iscsi: cmd 0x12 is not queued (8)
Jul  3 12:33:45 sun4150node2 kernel: iscsi: cmd 0x12 is not queued (8)
Jul 3 12:33:47 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32.
Jul  3 12:33:47 sun4150node2 multipathd: dm-2: add map (uevent)
Jul  3 12:33:47 sun4150node2 multipathd: dm-2: devmap already registered
Jul  3 12:33:47 sun4150node2 multipathd: 8:32: mark as failed
Jul  3 12:33:47 sun4150node2 multipathd: mpath0: remaining active paths: 1
Jul  3 12:33:47 sun4150node2 iscsid: connect failed (113)
Jul  3 12:33:47 sun4150node2 iscsid: connect failed (113)
Jul 3 12:33:49 sun4150node2 multipathd: sdc: rdac checker reports path is up
Jul  3 12:33:49 sun4150node2 multipathd: 8:32: reinstated
Jul  3 12:33:49 sun4150node2 multipathd: mpath0: remaining active paths: 2
Jul  3 12:33:49 sun4150node2 multipathd: mpath0: switch to path group #1
Jul  3 12:33:49 sun4150node2 multipathd: mpath0: switch to path group #1
Jul  3 12:33:49 sun4150node2 multipathd: dm-2: add map (uevent)
Jul  3 12:33:49 sun4150node2 multipathd: dm-2: devmap already registered
Jul  3 12:33:50 sun4150node2 multipathd: dm-2: add map (uevent)
Jul  3 12:33:50 sun4150node2 multipathd: dm-2: devmap already registered
Jul 3 12:33:50 sun4150node2 kernel: device-mapper: multipath: Failing path 8:32.
Jul  3 12:33:50 sun4150node2 kernel: iscsi: cmd 0x12 is not queued (8)
Jul  3 12:33:50 sun4150node2 multipathd: 8:32: mark as failed
Jul  3 12:33:50 sun4150node2 multipathd: mpath0: remaining active paths: 1

I'd be most grateful if someone can help me make sense of this. And preferably help me find and fix the bug.

multipath-tools version is 0.4.7.rhel5.13.

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux