Tejun Heo wrote:
I can offer to you rebuilding that md in a test environment, and
giving you access to it, if you're interested.
Can you hook up those failed drives to a different controller? Say,
ahci or ata_piix and put them under write load (ext3 w/ barrier=1 and
copying lots of files into it should work) and see whether the problem
reproduces?
I can move switch the disks to a sata_promise controller, I also have a sata_via
controller but I cannot get those disks to work at all on it (it initially sees
the disk, but does not finish init).
I don't on the machine that those disks are on have any other sata controllers.
Here are the errors I get, though look at it closer, I am don't appear
to be getting the reset, just this error from time to time:
sd 9:0:0:0: [sde] 976773168 512-byte hardware sectors (500108 MB)
sd 9:0:0:0: [sde] Write Protect is off
sd 9:0:0:0: [sde] Mode Sense: 00 3a 00 00
sd 9:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x280000 action 0x0
ata8.00: BMDMA2 stat 0x687d8009
ata8.00: cmd 25/00:80:a7:00:1d/00:01:1d:00:00/e0 tag 0 cdb 0x0 data
196608 in
res 51/04:8f:98:01:1d/00:00:1d:00:00/f0 Emask 0x1 (device error)
ata8.00: configured for UDMA/100
That's device abort error on read. The drive just can't read sector one
of the requested sectors and it's not sata_sil24. It's a bmdma one.
I have 4 identical disks, with all 4 connected to the SIL controller
all give some errors, moving 2 of the disks to a promise controller
makes the errors go away on the 2 connected to the promise
controller. All drives are part of a software raid5 array.
Ah.. okay, sata_sil. Roger, the moving and errors are not very likely
to have anything to do with each other. The only possibility is
transmission problems but the drive didn't report transport error (ICRC)
and it's more likely that the drive was experiencing temporary failures.
It's also possible that the drive set ABRT although there was some
problem with the transport tho.
If you move the drive back to the sata_sil, do those problems appear
again? Anyways, this doesn't really have anything to do with what Hans
is seeing.
I can swap the disk around next time I reboot the machine, the 2 on the promise
will go to the sil and the 2 on the sil will go to the promise, from past
testing I expect the disk on the sil to have the errors and the ones on the
promise to not have errors.
After I looked at the error more carefully and I though that too, I had
originally thought I was getting resets also but I was wrong on that.
Roger
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html