On 02/08/2010 05:11 AM, Håkon Løvdal wrote:
Hi. I have had some trouble with the machine I want to have as a file server. After having let the "get raid up and running reliably" project lie dormant for some time, I tried again this Friday. After connecting the disks, the status was the following: 4 out of 6 disk in a raid6 setup were recognised (see log-1). I was able to mount the volume when the machine was finished booting. I then added the two missing disks with mdadm, one of them started rebuilding and the other one were not recognised in some way (log-2). The rebuild of the disk was successfull (log-3), but later some errors occured, see log-4 below, and now only three disks are left in the array (log-5). Are these errors related to Tejun's recent statement "Sil3112/3114 are now virtually the only controllers with occassional and unresolved data corruption issues."? Disks sda (hosting root file system for os), sdb sdc and sdd are connected the motherboard while sde, sdf and sdg are connected to a controller card using 3114:
..
---BEGIN log-4--- Feb 6 07:09:57 localhost kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Feb 6 07:09:57 localhost kernel: ata8.00: BMDMA2 stat 0x6c0009 Feb 6 07:09:57 localhost kernel: ata8.00: cmd 25/00:80:cf:cd:69/00:00:2f:00:00/e0 tag 0 dma 65536 in Feb 6 07:09:57 localhost kernel: res 51/40:00:e4:cd:69/00:00:2f:00:00/e0 Emask 0x9 (media error) Feb 6 07:09:57 localhost kernel: ata8.00: status: { DRDY ERR } Feb 6 07:09:57 localhost kernel: ata8.00: error: { UNC }
That's fairly definitive, uncorrected read error reported by the drive. You might want to check its SMART status. Could be a bad drive, or potentially other causes like excessive vibration, high temperature, power issues..
-- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html