Re: Debugging new HW XOR engine driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dan,
Corruption I see is ext3_chec_descriptor: error. Error suggesting to run fsck. When I run fsck it complains that "there is no valid file system". Curroption seems to be happening with only huge data files being written to /dev/md0. 
  I tried to write one stripe(12k -- with 4disks) and half stripe (6kB). I don't corruption with small files, Every time I write and read it back works fine up to 40MB. 
 I know it is too much to ask . Do you happened to have sample code you used to debug  your driver. I am using xor_blocks() function to compute XOR in the aync_xor() function to compare SW and HW XOR calculations. I am not sure if that is right way to do it. So far I did not see data missmatch.
Thanks and Regards,
Marri
 

----- Original Message ----
From: Dan Williams <dan.j.williams@xxxxxxxxx>
To: tirumalareddy marri <tirumalareddymarri@xxxxxxxxx>
Cc: thomas62186218@xxxxxxx; linux-raid@xxxxxxxxxxxxxxx
Sent: Tuesday, July 15, 2008 11:48:46 PM
Subject: Re: Debugging new HW XOR engine driver

On Tue, Jul 15, 2008 at 3:52 PM, tirumalareddy marri
<tirumalareddymarri@xxxxxxxxx> wrote:
> I am able to create a disk size of 40MB and mount it(mkfs.ext3 -b 4096 /dev/md0 10000). I was able to copy files to this mounted disk and read them back. If I increased the size more than 40MB file system if failing to mount.
>  Is it possible that data I have read/write was in page cache and never really written to Hard Disks  ?

What does the corruption look like?  Does it seem to be wrong data or
stale data?

> Is it safe to say RAID-5 is partially working ?

Without more information this sounds like the hw-xor driver is broken.
What kernel version are you developing against?  You may want to take
a look at the dmatest client in async_tx/next [1].  It currently only
supports copy tests, but should exercise your driver's descriptor
processing routines.  When I tracked down bugs in iop-adma I used
raid5 as the test client and modified the kernel to do data
verification after each calculation in the ops_complete_* routines.
This requires userspace to use a predictable data pattern when writing
to the array.

--
Dan

[1] http://git.kernel.org/?p=linux/kernel/git/djbw/async_tx.git;a=shortlog;h=next
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux