Two cents worth being oblivious to previous discussions in this thread. see below in-line. > -----Original Message----- > From: redhat-list-bounces@xxxxxxxxxx > [mailto:redhat-list-bounces@xxxxxxxxxx]On Behalf Of Thierry ITTY > Sent: Friday, August 13, 2004 11:33 AM > To: redhat-list@xxxxxxxxxx > Subject: bad blocks... random death > > > this continues discussions about bad disk blocks not really > bad and redhat > 9 dying randomly > > we're now a few on this list experiencing various symptoms > (dma errors, bad > blocks on disks, system freeze or death) that look like > hardware problems. > after talking together we can now say that those problems > are pure OS > problems. If all are SMP systems, then perhaps there is a Spinlock conflict (multi-cpu contention) problem with the disk driver. But I doubt that the disk drivers in the kernel have changed in years. I am running RH9 on several heavily used scsi based Compaq multi-cpu machines with no problems. So based on my experience, I dount believe in a softwrae issue here. > > the disks with bad blocks work actually fine elswhere (in my > case I ran the > manufacturer low-level diags and no disk had any problem. > and, ain't it > very strange that 10 disks get the same problems at the same > time ?!!!) Not if you have an EMI (electro-magnetic interference) shielding issue. The drives are fine. They might be cross polluting each other ,the cables and/or the controllers with EMI. that will corrupt the bit sream between the drives and the controller and give you errors. The heavier you use the drives, the more the magnetic coils that move the heads are used. Those coils put out an EMI field. The more your use the drives, the more consistent that EMI field is and without good grounding it "leak" into whatever copper ground path is available including your drive cables, power cables, etc. normally Emi is drained off through the drive's grounds to the chassis. It's grounded to the chassis and through the chassis to the ground line on the power supply to earth. check the following if you haven't already as it applies to your system: 1) get an electrical outlet tester at your local Home Depot/Loews et.al 2) Check the outlets your systems are plugged into. (if you use non nema 5-15R/5-20R outlets (household type) then get a tester or electrical testing service in to check your grounds.) 3) Make sure you have a good reliable earth ground at the outlet. If you dont, get it fixed. You would be surprised at how many outlets dont have valid earth grounds. If you are in a commercial building, your data center outlets should have been installed with ISOLATED Grounds , that is a separate ground wire between the power panel and the receptable. Most commercial electrical uses the metal jacket as a ground path and that tends to come apart over time (ie NO MORE GROUND) 4) Check the power supply - make sure you are not overloading it past it's rated maximum output. Make sure that it is grounded to the chassis and to the earth ground. Normally it grounds the chassis through it's case but some have separate ground connections, look for ground screw connections. 5) If your drives have ground screws or Tabs on them, connect them to a reliable chassis ground point. dont assume they have a good ground through the drive mounting screws. 6) Use round shielded cables and watch the grounds on them. If they are single ended grounds on the shields make sure that the connected end is connected to a valid ground source. 7) Grounds are normally single end connected to prevent ground fault loops, that is, you dont want more than one ground path here if you can help it. Multiple ground paths wont help and can hurt under the wrong circumstances. Drives with ground tabs dont generally ground through the mounting screws, but check the drive specs. A cable with the shield connected at both ends is also expecting to ground the drive, the cable should be connecting to a ground pin on the drives interface. 8) If you have these drives "dense packed" in your chassis, you might want to consider putting grounded shields between them if all else fails, grounded copper plates for example. 9) Make sure that you route the power cables away from the drive controller cables within the chassis. 10) look for ways that EMI could be crossing. 11) You might just have one really EMI noisy drive. There are EMI meters that can be used to measure EMI levels. 12) You can also be subject to a different wavelength of radiation knows as RFI , or Radio Frequency Interference. > > the problem happens on various machines (gigabyte, asus, > athlon, pentium, > maxtor, western...). > > it seems it is related to high load periods (in my case a > heavily used file > server). > > we've been advised to change dma disks settings. I tried > various things (no > dma at all, forcing mdma0 or udma2). the system behave > differently (either > no errors or other errors as dma timeouts), but it's not > working quite well > (for example deactivating dma on disks lowers the average network > throughput from 50 MB/s to 1.5 !!! almost 40 times slower !!! > > we really need help to investigate this problem which causes > io errors and > fs corruption ! > > tia > > > -- > redhat-list mailing list > unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe > https://www.redhat.com/mailman/listinfo/redhat-list > -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list