RR
Sent from my GoodLink synchronized handheld (www.good.com)
-----Original Message-----
From: SUVANKAR MOITRA [mailto:suvankar_moitra@xxxxxxxxx]
Sent: Friday, July 06, 2007 03:33 AM Pacific Standard Time
To: linux clustering
Subject: Re: SV: [linux-cluster] multipath issue... Smells of hardware issue.
hi ,
Pl install the device driver in failover mode.
regards
Suvankar
--- Kristoffer Lippert <kristoffer.lippert@xxxxxxxx>
wrote:
> Hi,
>
> Thank you very much for the explaination.
>
> The hardware should under no circumstances take 5
> minutes to perform a readsector. Not even when the
> command queue is very long.
> I've tried copying files to and from the SAN, and
> i've tried a little program called sys_basher
> working the disks continously since last Friday.
> (almost a week) and i have not been able to
> reproduce the error. Before i could produce it
> within an hour by copying files.
> I've only seen the error on one server, and i've
> changed nothing. (well, obvouisly something must
> have changed since the error seems to be gone.)
>
> I get a throughput of about 120mb/sec on the san
> using GFS1. It's fast enough for my use (wich is
> large files for a website). Is it far below expected
> throughput?
>
> Kind regards
> Kristoffer
>
>
>
>
> -----Oprindelig meddelelse-----
> Fra: linux-cluster-bounces@xxxxxxxxxx
> [mailto:linux-cluster-bounces@xxxxxxxxxx] På vegne
> af Benjamin Marzinski
> Sendt: 5. juli 2007 22:01
> Til: linux clustering
> Emne: Re: [linux-cluster] multipath issue... Smells
> of hardware issue.
>
> On Fri, Jun 29, 2007 at 05:23:20PM +0200, Kristoffer
> Lippert wrote:
> > Hi,
> >
> > I have a setup with two identical RX200s3 FuSi
> servers talking to a SAN
> > (SX60 + extra controller), and that works fine
> with gfs1.
> >
> > I do however see some errors on one of the
> servers. It's in my message log
> > and only now and then now and then (though
> always under load, but i cant
> > load it and thereby force it to give the
> error).
> >
> > The error says:
> > Jun 28 15:44:17 app02 multipathd: 8:16: mark as
> failed
> > Jun 28 15:44:17 app02 multipathd:
> main_disk_volume1: remaining active
> > paths: 1
> > Jun 28 15:44:17 app02 kernel: sd 2:0:0:0: SCSI
> error: return code =
> > 0x00070000
> > Jun 28 15:44:17 app02 kernel: end_request: I/O
> error, dev sdb, sector
> > 705160231
> > Jun 28 15:44:17 app02 kernel: device-mapper:
> multipath: Failing path 8:16.
> > Jun 28 15:44:22 app02 multipathd: sdb:
> readsector0 checker reports path is
> > up
> > Jun 28 15:44:22 app02 multipathd: 8:16:
> reinstated
> > Jun 28 15:44:22 app02 multipathd:
> main_disk_volume1: remaining active
> > paths: 2
> > Jun 28 15:46:02 app02 multipathd: 8:32: mark as
> failed
> > Jun 28 15:46:02 app02 multipathd:
> main_disk_volume1: remaining active
> > paths: 1
> > Jun 28 15:46:02 app02 kernel: sd 3:0:0:0: SCSI
> error: return code =
> > 0x00070000
> > Jun 28 15:46:02 app02 kernel: end_request: I/O
> error, dev sdc, sector
> > 739870727
> > Jun 28 15:46:02 app02 kernel: device-mapper:
> multipath: Failing path 8:32.
> > Jun 28 15:46:06 app02 multipathd: sdc:
> readsector0 checker reports path is
> > up
> > Jun 28 15:46:06 app02 multipathd: 8:32:
> reinstated
> > Jun 28 15:46:06 app02 multipathd:
> main_disk_volume1: remaining active
> > paths: 2
> >
> > To me i looks like a fiber that bounces up and
> down. (There is no switch
> > involved).
> >
> > Sometimes i only get a slightly shorter
> version:
> > Jun 29 09:04:32 app02 kernel: sd 2:0:0:0: SCSI
> error: return code =
> > 0x00070000
> > Jun 29 09:04:32 app02 kernel: end_request: I/O
> error, dev sdb, sector
> > 2782490295
> > Jun 29 09:04:32 app02 kernel: device-mapper:
> multipath: Failing path 8:16.
> > Jun 29 09:04:32 app02 multipathd: 8:16: mark as
> failed
> > Jun 29 09:04:32 app02 multipathd:
> main_disk_volume1: remaining active
> > paths: 1
> > Jun 29 09:04:37 app02 multipathd: sdb:
> readsector0 checker reports path is
> > up
> > Jun 29 09:04:37 app02 multipathd: 8:16:
> reinstated
> > Jun 29 09:04:37 app02 multipathd:
> main_disk_volume1: remaining active
> > paths: 2
> >
> > Any sugestions, but start swapping hardware?
>
> It's possible that your scsi device is timing out
> the scsi read command from the readsector0 path
> checker, which is what it appears that your setup is
> using to check the path status. This checker has
> it's timeout set to 5 minutes, but I suppose that it
> is possible to take this long if your hardware is a
> flaky. If you're willing to recompile the code, you
> can change this default by changing DEF_TIMEOUT in
> libcheckers/checkers.h. DEF_TIMEOUT is the scsi
> command timeout in milliseconds.
>
> Otherwise, if you are only seeing this on one
> server, swapping hardware seems like a reasonable
> thing to try.
>
> -Ben
>
> > Mvh / Kind regards
> >
> > Kristoffer Lippert
> > Systemansvarlig
> > JP/Politiken A/S
> > Online Magasiner
> >
> > Tlf. +45 8738 3032
> > Cell. +45 6062 8703
>
> > --
> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
>
https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
>
https://www.redhat.com/mailman/listinfo/linux-cluster
>
____________________________________________________________________________________
Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster