At 18:14 3/11/2008, Marc Bejarano wrote:
>At 11:36 3/10/2008, James Bottomley wrote:
>>On Fri, 2008-03-07 at 17:40 -0500, Marc Bejarano wrote:
>>> This is hard to explain. It looks like page 309713975 got written
>>> out to the proper spot, but then the first 10752 bytes got written
>>> out again to the wrong spot?!?
>>
>>this pattern of corruption
>>is almost completely definitive of a disk problem with head positioning.
<snip>
>>I'm afraid the only way to confirm this theory definitively will be with
>>the destructive disktest from autotest (it was actually constructed to
>>check for drive head positioning errors)
>
>thanks to you (and grant) for the pointer! will try that next.
unfortunately, we have been unable to reproduce the corruption using
disktest :( running for days ends up with no corruption. my
colleague had already written a similar tool and wasn't able to
reproduce the problem with it, either. i don't think this rules out
a head positioning problem, though.
we can easily reproduce the issue using the server's intended
workload with the intended configuration, but that isn't something we
can give to seagate to reproduce in-house.
having never played with blocktrace, i have no idea what it's
capabilities are. can it be used to record not just the IOs, but
also their timings? any other ideas? i'm at a loss for how to turn
my reproducible test case into something i can send to seagate for
investigation.
thanks,
marc
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html