Re: How to force rewrite of a smart detected bad block with raid5: checkarray?

Marc MERLIN <marc@xxxxxxxxxxx> · Wed, 19 Jan 2011 09:31:50 -0800

On Wed, Jan 19, 2011 at 08:41:15PM +1100, NeilBrown wrote:
> All you need to do is get md/raid5 to try reading the bad block.  Once it does
> that it will get a read error and automagically try to correct it.

So, if I get this right, raid5 only reads n-1 drives. Unless I'm unlucky
enough to have the bad disk be the parity stripe, just reading the file with
a bad stripe by luck would cause the kernel to recompute parity on the read
error and re-write the bad block?
(I also read in the online docs that raid4 actually reads all the blocks,
including parity, which is a bit slower, but would actually guarantee that
all blocks are read, and parity is still consistent at ready time?)

But back to your point: check, which I had started, will indeed do what I
was hoping it would, thanks.

> If you were really keen, you could 
>   cd /sys/block/mdXX/md
>   echo 3907029168 > sync_min
>   echo 3907029170 > sync_max
>   echo check > sync_action

I stopped the full check, and tried:

gargamel:/sys/block/md7/md# cat sync_min
244188936
gargamel:/sys/block/md7/md# cat sync_max
max
gargamel:/sys/block/md7/md# echo 3907029168 > sync_min
bash: echo: write error: Invalid argument

Any idea what went wrong here?

gargamel:/sys/block/md7/md# mdadm --detail /dev/md7
/dev/md7:
        Version : 1.02
  Creation Time : Thu Mar 25 20:15:00 2010
     Raid Level : raid5
     Array Size : 7814045696 (7452.05 GiB 8001.58 GB)
  Used Dev Size : 1953511424 (1863.01 GiB 2000.40 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent

    Update Time : Wed Jan 19 09:27:57 2011
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : gargamel.svh.merlins.org:7  (local to host gargamel.svh.merlins.org)
           UUID : 5884576b:0e402a5d:8629093c:ec020760
         Events : 28714

    Number   Major   Minor   RaidDevice State
       0       8      129        0      active sync   /dev/sdi1
       1       8      145        1      active sync   /dev/sdj1
       2       8      161        2      active sync   /dev/sdk1
       3       8      177        3      active sync   /dev/sdl1
       5       8      113        4      active sync   /dev/sdh1

As for docs, a bit of googling before posting didn't help. I since then
found the new README.checkarray in my /usr/share/doc (debian), so that helps
although it doesn't talk about check vs repair.

Also, I didn't find anything about sync_action, check, and repair in the
mdadm man page (a pointer to
https://raid.wiki.kernel.org/index.php/RAID_Administration
would me useful).
Actually the above page still says that you can't check just a range of blocks.

Is there more up to date documentation that I should be reading somewhere?

Thanks for your answer,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html