Errors when copying between drives on a SiI3114 controller under kernel 2.6.18

"Jonathan Bell" <doggs.lay.eggs@xxxxxxxxxxxxxx> · Sat, 07 Oct 2006 14:11:51 +0100

Hello

I have been having input/output errors copying data between drives
attached to the same controller. I have two 3114 cards, a set of four
Seagate 250GB drives (Model: ST3250824NS  Rev: 3.AE) and set of 3 Maxtor
300GB drives (Model:6L300S0  Rev:BACE). This problem is reproducible
across all the drives and both controller cards.

The problem is that when copying a file off one drive on the controller to
another on the same controller, be it via dd or cp, the file that gets
written becomes corrupted along with the filesystem itself. Here is an
extract from dmesg:

[12689.451466] attempt to access beyond end of device
[12689.451475] sdb1: rw=0, want=2339438600, limit=488392002
[12689.451480] attempt to access beyond end of device
[12689.451484] sdb1: rw=0, want=18446744056529747976, limit=488392002
[12689.453822] attempt to access beyond end of device
[12689.453831] sdb1: rw=0, want=2339438600, limit=488392002
[12689.453834] Buffer I/O error on device sdb1, logical block 292429824
[12689.453935] attempt to access beyond end of device
[12689.453938] sdb1: rw=0, want=2339438600, limit=488392002
[12689.453941] Buffer I/O error on device sdb1, logical block 292429824

The actual command used was:

cp ~/hugefile /mnt/sda1
cp /mnt/sda1/hugefile /mnt/sdb1/
md5sum /mnt/sda1/hugefile /mnt/sdb1/hugefile

where hugefile is a 4.9GB piped output of "yes 0123456789" on ~/, a PATA
drive used for the root filesystem and /home.
md5sum calculates the first file checksum fine and errors on the second
file.

ccf5f9052aa1fac3062c3f1920abb1fc  /mnt/sda1/hugefile
md5sum: /mnt/sdb1/hugefile: Input/output error

The exact same problem happens when the drives are reversed, i.e. the file
is copied to sdb1 first then copied/dd'd to sda1, md5sum on sda1 bombs.
There is no problem copying the file individually to each drive from
~/hugefile and performing the above test on drives from different
controllers. All the drives have been rotated, the same test repeated with
exactly the same result. Each drive has had a complete "badblocks -w -s"
performed on them with no problems.

I have performed the same test with ext2, ext3 and reiserfs 3.6 and all
exhibit the same behaviour: seeking beyond the end of the disk to
ludicrously high sectors.

I would like some help tracking down the cause of this problem as I have
practically exhausted the methods currently at my disposal - my best guess
at the moment is that data being written to another port is being trampled
on somehow but only when there is I/O active on another port. I will
continue testing to see if simultaneous writes to multiple drives on a
controller causes the same problem.

Thanks for any advice you can give,
Jonathan
Attachment:
lspci.txt.gz

Description: GNU Zip compressed data
Attachment:
dmesg.txt.gz

Description: GNU Zip compressed data