Re: SV: [linux-cluster] multipath issue... Smells of hardware issue.

SUVANKAR MOITRA <suvankar_moitra@xxxxxxxxx> · Fri, 6 Jul 2007 02:32:40 -0700 (PDT)

hi ,

Pl install the device driver in failover mode.

regards

Suvankar
--- Kristoffer Lippert <kristoffer.lippert@xxxxxxxx>
wrote:

> Hi,
> 
> Thank you very  much for the explaination.
> 
> The hardware should under no circumstances take 5
> minutes to perform a readsector. Not even when the
> command queue is very long.
> I've tried copying files to and from the SAN, and
> i've tried a little program called sys_basher
> working the disks continously since last Friday.
> (almost a week) and i have not been able to
> reproduce the error. Before i could produce it
> within an hour by copying files.
> I've only seen the error on one server, and i've
> changed nothing. (well, obvouisly something must
> have changed since the error seems to be gone.) 
> 
> I get a throughput of about 120mb/sec on the san
> using GFS1. It's fast enough for my use (wich is
> large files for a website). Is it far below expected
> throughput? 
> 
> Kind regards
> Kristoffer
> 
> 
> 
> 
> -----Oprindelig meddelelse-----
> Fra: linux-cluster-bounces@xxxxxxxxxx
> [mailto:linux-cluster-bounces@xxxxxxxxxx] På vegne
> af Benjamin Marzinski
> Sendt: 5. juli 2007 22:01
> Til: linux clustering
> Emne: Re: [linux-cluster] multipath issue... Smells
> of hardware issue.
> 
> On Fri, Jun 29, 2007 at 05:23:20PM +0200, Kristoffer
> Lippert wrote:
> >    Hi,
> > 
> >    I have a setup with two identical RX200s3 FuSi
> servers talking to a SAN
> >    (SX60 + extra controller), and that works fine
> with gfs1.
> > 
> >    I do however see some errors on one of the
> servers. It's in my message log
> >    and only now and then now and then (though
> always under load, but i cant
> >    load it and thereby force it to give the
> error).
> > 
> >    The error says:
> >    Jun 28 15:44:17 app02 multipathd: 8:16: mark as
> failed
> >    Jun 28 15:44:17 app02 multipathd:
> main_disk_volume1: remaining active
> >    paths: 1
> >    Jun 28 15:44:17 app02 kernel: sd 2:0:0:0: SCSI
> error: return code =
> >    0x00070000
> >    Jun 28 15:44:17 app02 kernel: end_request: I/O
> error, dev sdb, sector
> >    705160231
> >    Jun 28 15:44:17 app02 kernel: device-mapper:
> multipath: Failing path 8:16.
> >    Jun 28 15:44:22 app02 multipathd: sdb:
> readsector0 checker reports path is
> >    up
> >    Jun 28 15:44:22 app02 multipathd: 8:16:
> reinstated
> >    Jun 28 15:44:22 app02 multipathd:
> main_disk_volume1: remaining active
> >    paths: 2
> >    Jun 28 15:46:02 app02 multipathd: 8:32: mark as
> failed
> >    Jun 28 15:46:02 app02 multipathd:
> main_disk_volume1: remaining active
> >    paths: 1
> >    Jun 28 15:46:02 app02 kernel: sd 3:0:0:0: SCSI
> error: return code =
> >    0x00070000
> >    Jun 28 15:46:02 app02 kernel: end_request: I/O
> error, dev sdc, sector
> >    739870727
> >    Jun 28 15:46:02 app02 kernel: device-mapper:
> multipath: Failing path 8:32.
> >    Jun 28 15:46:06 app02 multipathd: sdc:
> readsector0 checker reports path is
> >    up
> >    Jun 28 15:46:06 app02 multipathd: 8:32:
> reinstated
> >    Jun 28 15:46:06 app02 multipathd:
> main_disk_volume1: remaining active
> >    paths: 2
> > 
> >    To me i looks like a fiber that bounces up and
> down. (There is no switch
> >    involved).
> > 
> >    Sometimes i only get a slightly shorter
> version:
> >    Jun 29 09:04:32 app02 kernel: sd 2:0:0:0: SCSI
> error: return code =
> >    0x00070000
> >    Jun 29 09:04:32 app02 kernel: end_request: I/O
> error, dev sdb, sector
> >    2782490295
> >    Jun 29 09:04:32 app02 kernel: device-mapper:
> multipath: Failing path 8:16.
> >    Jun 29 09:04:32 app02 multipathd: 8:16: mark as
> failed
> >    Jun 29 09:04:32 app02 multipathd:
> main_disk_volume1: remaining active
> >    paths: 1
> >    Jun 29 09:04:37 app02 multipathd: sdb:
> readsector0 checker reports path is
> >    up
> >    Jun 29 09:04:37 app02 multipathd: 8:16:
> reinstated
> >    Jun 29 09:04:37 app02 multipathd:
> main_disk_volume1: remaining active
> >    paths: 2
> > 
> >    Any sugestions, but start swapping hardware?
> 
> It's possible that your scsi device is timing out
> the scsi read command from the readsector0 path
> checker, which is what it appears that your setup is
> using to check the path status.  This checker has
> it's timeout set to 5 minutes, but I suppose that it
> is possible to take this long if your hardware is a
> flaky. If you're willing to recompile the code, you
> can change this default by changing DEF_TIMEOUT in
> libcheckers/checkers.h. DEF_TIMEOUT is the scsi
> command timeout in milliseconds.
> 
> Otherwise, if you are only seeing this on one
> server, swapping hardware seems like a reasonable
> thing to try.
> 
> -Ben
>  
> >    Mvh / Kind regards
> > 
> >    Kristoffer Lippert
> >    Systemansvarlig
> >    JP/Politiken A/S
> >    Online Magasiner
> > 
> >    Tlf. +45 8738 3032
> >    Cell. +45 6062 8703
> 
> > --
> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 

      ____________________________________________________________________________________
Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/ 

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster