[cc'ing Carlos Pardo]
Jonathan Bell wrote:
On Sun, 08 Oct 2006 05:33:42 +0100, Tejun Heo <htejun@xxxxxxxxx> wrote:
Hello.
Jonathan Bell wrote:
The problem is that when copying a file off one drive on the
controller to
another on the same controller, be it via dd or cp, the file that gets
written becomes corrupted along with the filesystem itself. Here is an
extract from dmesg:
That's very weird.
[12689.451466] attempt to access beyond end of device
[12689.451475] sdb1: rw=0, want=2339438600, limit=488392002
[12689.451480] attempt to access beyond end of device
[12689.451484] sdb1: rw=0, want=18446744056529747976, limit=488392002
[12689.453822] attempt to access beyond end of device
[12689.453831] sdb1: rw=0, want=2339438600, limit=488392002
[12689.453834] Buffer I/O error on device sdb1, logical block 292429824
[12689.453935] attempt to access beyond end of device
[12689.453938] sdb1: rw=0, want=2339438600, limit=488392002
[12689.453941] Buffer I/O error on device sdb1, logical block 292429824
[--snip--]
I would like some help tracking down the cause of this problem as I have
practically exhausted the methods currently at my disposal - my best
guess
at the moment is that data being written to another port is being
trampled
on somehow but only when there is I/O active on another port. I will
continue testing to see if simultaneous writes to multiple drives on a
controller causes the same problem.
Can you repeat the test using raw devices - /dev/sdX? I don't think
filesystem is at fault, so let's rule it out. Also, please post the
result of lspci -nvvvxxx
Thanks.
See attached for the lspci output.
I have confirmed the problem still happens with the following command:
yes 0123456789 | dd of=/dev/sda1 & dd if=/dev/sdb1 of=/dev/null &
I killed it after a while, then did "uniq /dev/sda1"
The results were.... interesting - instead of just 0123456789 I ended up
with a whole load of variations on the theme of "0123456789". Attached
is an extract. While this proved the problem still is there I don't
really know how to send you any useful information without sending you a
~256 megabyte dump of /dev/sda1 (compressed it is still approximately
1.8MB)
From the looks of things the corruptions are few and far between - I
wouldn't know how to check how often they occur or what length they are
though.
Also, I probed the validity of the "Buffer I/O error" and found that the
logical block wasn't actually corrupted - dd read it just fine - it was
full of 0x00 (from badblocks I guess).
I cannot reproduce your problem here. Can you retest after running the
following commands?
# setpci -s 01:07.0 0c.b=04
# setpci -s 01:08.0 0c.b=04
The above commands adjust cache line size to 16bytes.
Carlos, the whole thread can be found at the following URL. lspci
-nvvvxx result is there too.
http://thread.gmane.org/gmane.linux.ide/13381/focus=13381
--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html