Re: Silent corruption on AMD64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Aaron Lehmann wrote:
> I discovered a reproducible way of causing silent file corruption.
...
> 1. Heavy Ethernet load (nc remotehost < /dev/zero)
> 2. Heavy disk write load on any non-sata_sil drive (cat /dev/zero > /path)
> 3. Heavy disk read load on any other drive (tar c /path | cat > /dev/null)

Since it shows up under heavy load that includes unrelated devices, I
think ruling out hardware problems is important.  Some suggestions:

- Use mcelog to see if you're getting any machine check exceptions
  that would indicate hardware error: http://freshmeat.net/projects/mcelog/

- Use the edac module to turn on pci parity and memory error checks:
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/drivers/edac/edac.txt

- Run memtest86+ for several loops to make sure your RAM is ok

- Try moving the SiI card to a different slot

- Try running the SATA drives from a separate power supply

- Move disks and cables around to see whether the problem follows the
  disks, the cables, or the controllers

- Try enabling the "spread spectrum" clock option in your BIOS to
  reduce EMI

-jim
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux