Problem on aic7899 with smp kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> From: David Chow <davidchow@xxxxxxxxxxxxxxxx>
> Organization: Shaolin Microsystems Ltd.
> Subject: Problem on aic7899 with smp kernel
> 
> Dear all,
> 
> My machine consistently locks up after the system has powered up for
> around 20minutes. Each time it locks up the console dumps the following
> messages.  I've read some posting in this lists and saying it is a
> hardware problem. However, the problem only exists when booting smp
> kernel. To me, it seems it is a problem come from a badly written driver
> rather than hardware. My kernel is 2.4.20-18smp and is running on a Tyan
> mother board, dual athlon with onbaord aic7899 , running md with RAID-1
> mirroring, Seagate Cheetah 36GB SCSI . The system can only survive with
> non-smp kernels. Any help is appreciated. Thanks.
> 
> regards,
> David Chow

I've been running RedHat 8.0 in SMP mode on a HP/Compaq ProLiant ML370
machine. The machine has a pair of xeons with hyperthreads switched on
(to make 4 CPU contexts). It also has an aic7899 SCSI with 4 drives
and a 4 way mirror for the / partition (which contains pretty much all 
of the RedHat system).

Install of RH8.0 was no problem. For some reason RH9.0 will not install
because it will not read the CDROM but that's another issue.

The main problem that I have found is that the / filesystem is steadily
getting corrupted, slowly but surely. After about a week of activity it
is bad enough to need a reinstall and this is on a 4-way mirror.

Kernel is standard RH8.0: Linux version 2.4.18-14smp
 (bhcompile@xxxxxxxxxxxxxxxxxxxxxxxxxx)
 (gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7)) #1 SMP Wed Sep 4 12:34:47 EDT 2002

Looking at the Compaq web page, they say it all works fine and the only
important thing is to use their SP22002.exe erase program and set the
BIOS to "linux" mode, no problems here.

They also tell you to install a heap of their useless agents so they can
hack up your box with proprietary crap but none of that includes any
device drivers so it has nothing to do with this problem.

Finally I went to the SuSE site any they do have some interesting
information which is that you should use the aic7xxx_old.o module for
running the SCSI (I like this module better because it boots faster
and gives more useful /proc/scsi diagnostics). Switching SCSI modules
under RedHat 8.0 is painful because you have to mess with the initrd
file in /boot (maybe someone knows a quick way to do this?).

Anyhow, I did go into the initrd and put in aic7xxx_old.o and got it
to use that when it boots and get everything reinstalled so that my
files are OK and so far things are looking good. At this point it
is a matter of time whether my files start to corrupt again...

Anyone who thinks they are having a similar problem can easily check
test with "rpm --verify -a" and they will see a growing list of
"5" flags, each day a few more of them. After a while executable files
stop executing and then it's game over. Weirdest thing is that even
files that never get written to (i.e. /usr/bin and /bin) start to
corrupt which doesn't make sense unless some write blocks are going
to completely the wrong address.

By the way, if anyone is curious about the speed of the ProLiant ML370,
the xeons clock at 2.8G which gives thread nearly 5600 bogomips.
RAM bandwidth for linear access (i.e. not hopping around) is just under
5G bytes per second while you stay inside the L2 cache (which is 512k)
and then drops to just under 4G bytes per second when you hit main RAM.
RAM bandwidth for a forward-hopping pattern (e.g. linked list, skip
list or similar) is much worse, about 2G bytes per second inside L2
cache and about 128M bytes per second for main RAM. Obviously there is
a big penalty in access setup times and I think it speculatively grabs
chunks into cache. Since my main application uses mostly chunk
access and the chunks are averaging about 256 bytes, the xeon seems
pretty good. If your app does a lot of random access in small regions
then you will probably be unimpressed.

Across the SCSI discs I managed 140M bytes per second sustained write
speed but not with the 4-way mirror, I used some raid-0 for that one.
Also, the 140M is for a linear file write, its much slower when there
is a bit of seeking involved. Not much you can do about head movement.

Hope this helps, if anyone else is playing with similar gear...

	- Tel






-- 
Psyche-list mailing list
Psyche-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/psyche-list

[Index of Archives]     [Fedora General Discussion]     [Red Hat General Discussion]     [Centos]     [Kernel]     [Red Hat Install]     [Red Hat Watch]     [Red Hat Development]     [Red Hat 9]     [Gimp]     [Yosemite News]

  Powered by Linux