hi , Pl install the device driver in failover mode. regards Suvankar --- Kristoffer Lippert <kristoffer.lippert@xxxxxxxx> wrote: > Hi, > > Thank you very much for the explaination. > > The hardware should under no circumstances take 5 > minutes to perform a readsector. Not even when the > command queue is very long. > I've tried copying files to and from the SAN, and > i've tried a little program called sys_basher > working the disks continously since last Friday. > (almost a week) and i have not been able to > reproduce the error. Before i could produce it > within an hour by copying files. > I've only seen the error on one server, and i've > changed nothing. (well, obvouisly something must > have changed since the error seems to be gone.) > > I get a throughput of about 120mb/sec on the san > using GFS1. It's fast enough for my use (wich is > large files for a website). Is it far below expected > throughput? > > Kind regards > Kristoffer > > > > > -----Oprindelig meddelelse----- > Fra: linux-cluster-bounces@xxxxxxxxxx > [mailto:linux-cluster-bounces@xxxxxxxxxx] På vegne > af Benjamin Marzinski > Sendt: 5. juli 2007 22:01 > Til: linux clustering > Emne: Re: [linux-cluster] multipath issue... Smells > of hardware issue. > > On Fri, Jun 29, 2007 at 05:23:20PM +0200, Kristoffer > Lippert wrote: > > Hi, > > > > I have a setup with two identical RX200s3 FuSi > servers talking to a SAN > > (SX60 + extra controller), and that works fine > with gfs1. > > > > I do however see some errors on one of the > servers. It's in my message log > > and only now and then now and then (though > always under load, but i cant > > load it and thereby force it to give the > error). > > > > The error says: > > Jun 28 15:44:17 app02 multipathd: 8:16: mark as > failed > > Jun 28 15:44:17 app02 multipathd: > main_disk_volume1: remaining active > > paths: 1 > > Jun 28 15:44:17 app02 kernel: sd 2:0:0:0: SCSI > error: return code = > > 0x00070000 > > Jun 28 15:44:17 app02 kernel: end_request: I/O > error, dev sdb, sector > > 705160231 > > Jun 28 15:44:17 app02 kernel: device-mapper: > multipath: Failing path 8:16. > > Jun 28 15:44:22 app02 multipathd: sdb: > readsector0 checker reports path is > > up > > Jun 28 15:44:22 app02 multipathd: 8:16: > reinstated > > Jun 28 15:44:22 app02 multipathd: > main_disk_volume1: remaining active > > paths: 2 > > Jun 28 15:46:02 app02 multipathd: 8:32: mark as > failed > > Jun 28 15:46:02 app02 multipathd: > main_disk_volume1: remaining active > > paths: 1 > > Jun 28 15:46:02 app02 kernel: sd 3:0:0:0: SCSI > error: return code = > > 0x00070000 > > Jun 28 15:46:02 app02 kernel: end_request: I/O > error, dev sdc, sector > > 739870727 > > Jun 28 15:46:02 app02 kernel: device-mapper: > multipath: Failing path 8:32. > > Jun 28 15:46:06 app02 multipathd: sdc: > readsector0 checker reports path is > > up > > Jun 28 15:46:06 app02 multipathd: 8:32: > reinstated > > Jun 28 15:46:06 app02 multipathd: > main_disk_volume1: remaining active > > paths: 2 > > > > To me i looks like a fiber that bounces up and > down. (There is no switch > > involved). > > > > Sometimes i only get a slightly shorter > version: > > Jun 29 09:04:32 app02 kernel: sd 2:0:0:0: SCSI > error: return code = > > 0x00070000 > > Jun 29 09:04:32 app02 kernel: end_request: I/O > error, dev sdb, sector > > 2782490295 > > Jun 29 09:04:32 app02 kernel: device-mapper: > multipath: Failing path 8:16. > > Jun 29 09:04:32 app02 multipathd: 8:16: mark as > failed > > Jun 29 09:04:32 app02 multipathd: > main_disk_volume1: remaining active > > paths: 1 > > Jun 29 09:04:37 app02 multipathd: sdb: > readsector0 checker reports path is > > up > > Jun 29 09:04:37 app02 multipathd: 8:16: > reinstated > > Jun 29 09:04:37 app02 multipathd: > main_disk_volume1: remaining active > > paths: 2 > > > > Any sugestions, but start swapping hardware? > > It's possible that your scsi device is timing out > the scsi read command from the readsector0 path > checker, which is what it appears that your setup is > using to check the path status. This checker has > it's timeout set to 5 minutes, but I suppose that it > is possible to take this long if your hardware is a > flaky. If you're willing to recompile the code, you > can change this default by changing DEF_TIMEOUT in > libcheckers/checkers.h. DEF_TIMEOUT is the scsi > command timeout in milliseconds. > > Otherwise, if you are only seeing this on one > server, swapping hardware seems like a reasonable > thing to try. > > -Ben > > > Mvh / Kind regards > > > > Kristoffer Lippert > > Systemansvarlig > > JP/Politiken A/S > > Online Magasiner > > > > Tlf. +45 8738 3032 > > Cell. +45 6062 8703 > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. http://autos.yahoo.com/green_center/ -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster