This is the bus busy error that makes it through the multipath
drivers... Do you guys have any comments/status that you can give on
this issue?
brassow
Begin forwarded message:
*From: *"Murthy, Narasimha Doraswamy (STSD)" <narasimha.murthy@xxxxxx>
*Date: *March 29, 2006 10:03:03 AM CST
*To: *"Alasdair G Kergon" <agk@xxxxxxxxxx>
*Cc: *device-mapper development <dm-devel@xxxxxxxxxx>, Securelinux
<securelinux@xxxxxx>
*Subject: IO error on DM device
Reply-To: *device-mapper development <dm-devel@xxxxxxxxxx>
Hi Alasdair,
We are seeing an IO error problem on a DM device, when the HBA ports
of another host, seen through the same switch are disabled/enable.
We are not understanding on why the paths are failed when ports on
other hosts are disabled. Please explain.
Below is the problem description and steps to reproduce.
*Problem : * I/O Error on DM device on one host when HBA ports
of another host are disabled.
*OS distros :* RHEL4.0 U2/U3.
*HOW-TO reproduce the problem:*
1. Configure 2 storage arrays (A1, A2) and two host (H1, H2) in the
same zone, so that both the hosts can see both the arrays. Create
and present LUNs (L1, L2) from array (A1) to host (H1)
2. Stop the multipathd daemon (for testing purpose on why the IO
error when ports of other hosts are failed). Not stopping it may
take long time to reproduce the problem.
3. Start I/O on DM device representing luns L1 and L2 on host H1. We
used* dt tool* for IO exercising.
4. Disable host ports of host H2 or any port of array A2 one after
the other (few times) OR disable and enable the same port of the
other host – few times (may be 4-5 times).
5. Application (dt tool) aborts with IO error on host H1.
=====
Snippet of* sys log output* (while doing I/O on /dev/dm-0)
Feb 1 11:47:14 apwtest52 kernel: SCSI error : <2 0 0 1> return code
= 0x20000
Feb 1 11:47:14 apwtest52 kernel: end_request: I/O error, dev sda,
sector 1584600
Feb 1 11:47:14 apwtest52 kernel: device-mapper: dm-multipath:
Failing path 8:0. <=================path failed, after
disabling/enabling the H2 host port 1
Feb 1 11:47:14 apwtest52 kernel: end_request: I/O error, dev sda,
sector 1584608
Feb 1 11:47:45 apwtest52 kernel: SCSI error : <3 0 1 1> return code
= 0x20000
Feb 1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sdg,
sector 861400
Feb 1 11:47:45 apwtest52 kernel: device-mapper: dm-multipath:
Failing path 8:96. <=================path failed, after
disabling/enabling the H2 host port 2
Feb 1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sdg,
sector 861408
Feb 1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code
= 0x20000
Feb 1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 452760
Feb 1 11:47:45 apwtest52 kernel: device-mapper: dm-multipath:
Failing path 8:64. <=================path failed after
disabling/enabling the H2 host port 1
Feb 1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 452768
Feb 1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code
= 0x20000
Feb 1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 453784
Feb 1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 453792
Feb 1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code
= 0x20000
Feb 1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 454808
Feb 1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 454816
Feb 1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code
= 0x20000
Feb 1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 863960
Feb 1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
sector 863968
Feb 1 11:48:40 apwtest52 kernel: SCSI error : <2 0 1 1> return code
= 0x20000
Feb 1 11:48:40 apwtest52 kernel: end_request: I/O error, dev sdc,
sector 935384
Feb 1 11:48:40 apwtest52 kernel: device-mapper: dm-multipath:
Failing path 8:32. <================= after disabling/enabling the
H2 host port 2
Feb 1 11:48:40 apwtest52 kernel: end_request: I/O error, dev sdc,
sector 935392
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116924 <============All path to the device
/dev/dm-0 failed
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116925
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116926
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116927
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116928
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116929
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116930
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116931
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116932
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116933
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116934
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116935
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116936
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116937
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116938
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116939
Feb 1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
logical block 116940
*Observations :*
As we do the port failure on the other host, paths of the dm
device is failed and the subsequent port (i.e A2 or H2 ports)
disabling/enabling results into more numbers of path failure and
that leads into all path failure condition, which in turn results
into IO error on RHEL4.0 U2/U3.
Through the device-mapper debug driver we are finding that the
there is no valid path in*/ __choose_pgpath()/* and */
m->current_pgpath/* (m is pointer to struct multipath) is null when
it comes to map_io() in dm-mpath.c.
Another observation is that we are not seeing any IO errors when the
same test is executed on SLES9 SP3/SP4.
Please provide some pointers on why we are seeing this behavior or
is this a known thing at this point in time?
Thanks and regards
-Murthy
--
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel