Re: Fwd: IO error on DM device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Pardon the top-post.

When a port gets disabled or unplugged, etc. the switch issues an RSCN event. This is normal. All the HBAs still connected to the switch (and in the same zone, if I understand) will see this RSCN event. This is propagated as a DID_BUS_BUSY, which is the 0x20000 you are seeing and should be transient. However this error seems to cause all other IO paths to die when caught in a multipath environment.

For md, the solution is to use mdmpd (md's version of the multipath daemon). This keeps the "innocent" IO paths up and running. Without it, unplugging one HBA will cause all the IO paths to fail.

So based on the comments below and steps to reproduce, I'd say this is expected behavior since multipathd is turned off. What happens if multipathd is left running? It is implied that not stopping multipathd will still cause a failure but will take much longer. I am no expert on multipathd, so perhaps one of the dm developers can comment on multipathd handling these SCSI error.

Ryan



Jonathan E Brassow wrote:
>
This is the bus busy error that makes it through the multipath drivers... Do you guys have any comments/status that you can give on this issue?

brassow

Begin forwarded message:

    *From: *"Murthy, Narasimha Doraswamy (STSD)" <narasimha.murthy@xxxxxx>
    *Date: *March 29, 2006 10:03:03 AM CST
    *To: *"Alasdair G Kergon" <agk@xxxxxxxxxx>
    *Cc: *device-mapper development <dm-devel@xxxxxxxxxx>, Securelinux
    <securelinux@xxxxxx>
    *Subject:  IO error on DM device
    Reply-To: *device-mapper development <dm-devel@xxxxxxxxxx>

    Hi Alasdair,

    We are seeing an IO error problem on a DM device, when the HBA ports
    of another host, seen through the same switch are disabled/enable.
    We are not understanding on why the paths are failed when ports on
    other hosts are disabled. Please explain.

    Below is the problem description and steps to reproduce.

    *Problem     : * I/O Error on DM device on one host when HBA ports
    of another host are disabled.

    *OS distros :*  RHEL4.0 U2/U3.

    *HOW-TO reproduce the problem:*

    1. Configure 2 storage arrays (A1, A2) and two host (H1, H2) in the
    same zone, so that both the hosts can see both the arrays. Create
    and present LUNs (L1, L2) from array (A1) to host (H1)

    2. Stop the multipathd daemon (for testing purpose on why the IO
    error when ports of other hosts are failed). Not stopping it may
    take long time to reproduce the problem.

    3. Start I/O on DM device representing luns L1 and L2 on host H1. We
    used* dt tool* for IO exercising.

    4. Disable host ports of host H2 or any port of array A2 one after
    the other (few times) OR disable and enable the same port of the
    other host – few times (may be 4-5 times).

    5. Application (dt tool) aborts with IO error on host H1.


    =====

     Snippet of* sys log output* (while doing I/O on /dev/dm-0)


    Feb  1 11:47:14 apwtest52 kernel: SCSI error : <2 0 0 1> return code
= 0x20000
    Feb  1 11:47:14 apwtest52 kernel: end_request: I/O error, dev sda,
    sector 1584600

    Feb  1 11:47:14 apwtest52 kernel: device-mapper: dm-multipath:
    Failing path 8:0.     <=================path failed, after
    disabling/enabling the H2 host port 1

    Feb  1 11:47:14 apwtest52 kernel: end_request: I/O error, dev sda,
    sector 1584608

    Feb  1 11:47:45 apwtest52 kernel: SCSI error : <3 0 1 1> return code
= 0x20000
    Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sdg,
    sector 861400

    Feb  1 11:47:45 apwtest52 kernel: device-mapper: dm-multipath:
    Failing path 8:96.   <=================path failed, after
    disabling/enabling  the H2 host port 2

    Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sdg,
    sector 861408

    Feb  1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code
    = 0x20000

    Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
    sector 452760

    Feb  1 11:47:45 apwtest52 kernel: device-mapper: dm-multipath:
    Failing path 8:64.  <=================path failed after
    disabling/enabling the H2 host port 1

    Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
    sector 452768

    Feb  1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code
    = 0x20000

    Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
    sector 453784

    Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
    sector 453792

    Feb  1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code
    = 0x20000

    Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
    sector 454808

    Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
    sector 454816

    Feb  1 11:47:45 apwtest52 kernel: SCSI error : <3 0 0 1> return code
    = 0x20000

    Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
    sector 863960

    Feb  1 11:47:45 apwtest52 kernel: end_request: I/O error, dev sde,
    sector 863968

    Feb  1 11:48:40 apwtest52 kernel: SCSI error : <2 0 1 1> return code
    = 0x20000

    Feb  1 11:48:40 apwtest52 kernel: end_request: I/O error, dev sdc,
    sector 935384

    Feb  1 11:48:40 apwtest52 kernel: device-mapper: dm-multipath:
    Failing path 8:32.  <================= after disabling/enabling  the
    H2 host port 2

    Feb  1 11:48:40 apwtest52 kernel: end_request: I/O error, dev sdc,
    sector 935392

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116924    <============All path to the device
    /dev/dm-0 failed

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116925

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116926

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116927

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116928

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116929

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116930

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116931

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116932

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116933

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116934

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116935

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116936

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116937

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116938

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116939

    Feb  1 11:48:40 apwtest52 kernel: Buffer I/O error on device dm-0,
    logical block 116940

    *Observations :*

          As we do the port failure on the other host, paths of the dm
    device is failed and the subsequent port (i.e A2 or H2 ports)
    disabling/enabling results into more numbers of path failure and
    that leads into all path failure condition, which in turn results
    into IO error on RHEL4.0 U2/U3.

         Through the device-mapper debug driver we are finding that the
    there is no valid path in*/ __choose_pgpath()/* and */
    m->current_pgpath/* (m is pointer to struct multipath) is null when
    it comes to map_io() in dm-mpath.c.

    Another observation is that we are not seeing any IO errors when the
    same test is executed on SLES9 SP3/SP4.

    Please provide some pointers on why we are seeing this behavior or
    is this a known thing at this point in time?

    Thanks and regards

    -Murthy







--
    dm-devel@xxxxxxxxxx
    https://www.redhat.com/mailman/listinfo/dm-devel


--

dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux