Re: Vanishing array/filesystem....

"Forge 2KSP3" <forge@xxxxxxx> · Sat, 17 Aug 2002 11:36:38 -0400

I haven't had any problems like the ones you describe, but both motherboards
mentioned use Via chipsets, and Via is known for their very shoddy PCI
implementation. Those would be the first thing I suspected.

----- Original Message -----
From: "Mike Kirk" <kenora@xxxxxxxxxx>
Newsgroups: gmane.linux.ataraid
To: <Ataraid-list@xxxxxxxxxx>
Sent: Saturday, August 17, 2002 10:07 AM
Subject: Vanishing array/filesystem....

> Hello all,
>
> I have a linux (/dev/md0) raid 5 array consisting of 8  Western Digital
> 800BB (80gig) drives. They are attached to 2 Promise PDC20268 (TX2
ata100 -
> non-raid) PCI controllers. They are configured as 7+1, no spare. Boot
screen
> and dmesg show they have their own IRQs and are seen as ide2+3, 4+5.
Drives
> all show up as /dev/hde -> /dev/hdl. Each drive is manually jumpered to
> master or slave as appropriate (no cable select) and the Promise cards
both
> have the latest BIOS applied.
>
> What happens is that after anywhere from 15 minutes to 24 hours the
> filesystem/mount point stops responding. I.E. /dev/md0 is ext3 mounted to
> "/export1" and anything to do with /export1 stops. "ls -l" never returns
and
> you can't CTRL-C it. There are no /var/adm/messages logs. No kernel panic.
> Nothing on the console. Samba (smbd) process that is exporting this
> filesystem cannot be kill -9'd by root. Touching any drive with hdparm
never
> returns and you can't CTRL-C it. But /, /boot, and /export2 (non-raid)
> filesystems all continue to function normally. /proc/mdstat shows all
drives
> up "U". The box continues to function as normal (firewall/NAT host)
> filtering packets and hosting ssh sessions. "top" shows nothing spinning.
>
> I have tested this array on an Abit KT7 (Via KT133 chipset -3x256MB
pc133 -
> Athlon 1100) and a Abit KR7A (Via KT266A chipset - 2x512MB ddr266 - XP
> 1900+), both with latest BIOS and various memory timings (i.e. stock
> non-interleaved, configured by SPD, and 4-way low wait-state tweaks).
> Neither system is overclocked. Both run Enermax 430watt power supplies (2
> different models purchased a year apart). On both systems I tried 2.4.18
> kernel, 2.4.19rc3 and 2.4.19. I have shuffled/removed/replaced their
network
> cards (3 different brands) and have moved the controllers around to
various
> slots so they were/weren't sharing IRQs with other devices. And in both
> cases /export1 becomes unresponsive after at most 24 hours. Copying large
> amounts of data to the partition (both locally from another drive, or
> remotely via samba) seems to cause it to fail earlier.. but I cannot
> reliably reproduce the problem.... other that it has never worked for more
> than a day.
>
> 5 of the 8 drives were pulled from a different host to make the array, and
3
> were purchased new. Individually they all pass running badblocks. I ran
both
> systems overnight with memtest86 and no memory errors were found.
>
> I am stumped. The array has enough data on it I cannot easily reconfigure
it
> try combinations of fewer drives. Every time it fails requires about 3
hours
> to resync and fsck on boot. Since I have tried 2 systems I'm wondering if
> anybody has had any issues with the WD 800BB model drives, or with the
> Promise controllers?
>
> Should I just buy a 3ware 8-port controller?
>
> Any suggestions are appreciated.
>
> Thanks,
>
>     Mike
>
>
>
>
>
>
>
>
>
>
>
>
>
> PCI: No IRQ known for interrupt pin A of device 00:11.1. Please try using
> pci=biosirq.
>
>
>
>
>
>
> _______________________________________________
> 
> Ataraid-list@xxxxxxxxxx
> https://listman.redhat.com/mailman/listinfo/ataraid-list
>