Re: chunks/sectors/blocks/stripes ??

donotcare@xxxxxxxxxxx · Sun, 05 Aug 2018 13:35:44 -0400

Thank you for the replies.  I guess I realize that a mismatch in the absence of a known hardware failure is difficult to resolve.  I thought that maybe with raid6, the raid6check util would be able repeatedly do parity checks with/without various stripes in a systematic way to determine which one was most likely to be errant.  But at least in my case (see below) it did not.

Anyway, I figured out the answer to the other part of my question.   The location of the mismatches reported by md in the syslog is in bytes (not sectors).  And the stripe size is the same as the chunk size, at least in my arrays: 512KiB.

So I went from this:

[391225.243871] md1: mismatch sector in range 2742887048-2742887056
[391225.243877] md1: mismatch sector in range 2742887056-2742887064
[391225.243958] md1: mismatch sector in range 2742887064-2742887072
[391225.243959] md1: mismatch sector in range 2742887072-2742887080
[391225.243963] md1: mismatch sector in range 2742887080-2742887088

To this:

# raid6check /dev/md1 2678600  1
layout: 2
disks: 5
component size: 8001322745856
total stripes: 15261312
chunk size: 524288

disk: 0 - offset: 134217728 - size: 8001322745856 - name: /dev/sdk1 - slot: 4
disk: 1 - offset: 134217728 - size: 8001322745856 - name: /dev/sdi1 - slot: 3
disk: 2 - offset: 134217728 - size: 8001322745856 - name: /dev/sdg1 - slot: 2
disk: 3 - offset: 134217728 - size: 8001322745856 - name: /dev/sdd1 - slot: 1
disk: 4 - offset: 134217728 - size: 8001322745856 - name: /dev/sdc1 - slot: 0

Error detected at stripe 2678600, page 81: disk slot unknown
Error detected at stripe 2678600, page 82: disk slot unknown
Error detected at stripe 2678600, page 83: disk slot unknown
Error detected at stripe 2678600, page 84: disk slot unknown
Error detected at stripe 2678600, page 85: disk slot unknown

So unfortunately it did not point to a suspect device.

I tried to see what file was there, so not knowing exactly how to do this, i divided the ext4 4K block size into the byte offset of the first mismatch:

2742887048 / 4096 = 669650.158203125

So i guess it's in the middle of block 669650.

debugfs:  icheck 669650
Block   Inode number
669650  299761790

debugfs:  ncheck 299761790
Inode   Pathname
299761790       /blah/my/file/name.ext

Does that seem right?   To fix, I was thinking:

1) unmount /dev/md1
2) echo repair > /sys/block/md1/md/sync_action
3) fsck -f /dev/md1
4) mount /dev/md1
5) rm -f /blah/my/file/name.ext
6) # restore /blah/my/file/name.ext  from backups

would this work?

Does the repair sync_action fix mismatches on a non-degraded array? Given that I plan to delete the file, I guess it doesn't matter how the mismatch is fixed?

Thanks..
-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html