On Mon, Apr 25, 2011 at 6:30 PM, Mark Knecht <markknecht@xxxxxxxxx> wrote: > On Mon, Apr 25, 2011 at 3:32 PM, Mark Knecht <markknecht@xxxxxxxxx> wrote: >> I did a drive check today, first time in months, and found I have a >> high mismatch count on my RAID1 root device. What's the best way to >> handle getting this cleaned up? >> >> 1) I'm running some smartctl tests as I write this. >> >> 2) Do I just do an >> >> echo repair >> >> to md126 or do I have to boot a rescue CD before I do that? >> >> If you need more info please let me know. >> >> Thanks, >> Mark >> >> c2stable ~ # cat /sys/block/md3/md/mismatch_cnt >> 0 >> c2stable ~ # cat /sys/block/md6/md/mismatch_cnt >> 0 >> c2stable ~ # cat /sys/block/md7/md/mismatch_cnt >> 0 >> c2stable ~ # cat /sys/block/md126/md/mismatch_cnt >> 222336 >> c2stable ~ # df >> Filesystem      1K-blocks   ÂUsed Available Use% Mounted on >> /dev/md126      Â51612920 Â26159408 Â22831712 Â54% / >> udev           10240    432   Â9808  5% /dev >> /dev/md7       389183252 144979184 224434676 Â40% /VirtualMachines >> shm          Â6151452     0  6151452  0% /dev/shm >> c2stable ~ # cat /proc/mdstat >> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] >> md6 : active raid1 sdc6[2] sda6[0] sdb6[1] >>   Â247416933 blocks super 1.1 [3/3] [UUU] >> >> md7 : active raid6 sdc7[2] sda7[0] sdb7[1] sdd2[3] sde2[4] >>   Â395387904 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU] >> >> md3 : active raid6 sdc3[2] sda3[0] sdb3[1] sdd3[3] sde3[4] >>   Â157305168 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU] >> >> md126 : active raid1 sdc5[2] sda5[0] sdb5[1] >>   Â52436032 blocks [3/3] [UUU] >> >> unused devices: <none> >> c2stable ~ # >> > > The smartctl tests that I ran (long) completed without error on all 5 > drives in the system: > > SMART Error Log Version: 1 > No Errors Logged > > SMART Self-test log structure revision number 1 > Num ÂTest_Description  ÂStatus         ÂRemaining > LifeTime(hours) ÂLBA_of_first_error > # 1 ÂExtended offline  ÂCompleted without error    00%   Â2887     - > # 2 ÂExtended offline  ÂCompleted without error    00%   Â2046     - > > > So, if I understand correctly the next step I'd do would be something like > > echo repair >/sys/block/md126/md/sync_action > > but I'm unclear about the need to do this when mdadm seems to think > the RAID is clean: > > c2stable ~ # mdadm -D /dev/md126 > /dev/md126: >    ÂVersion : 0.90 > ÂCreation Time : Tue Apr 13 09:02:34 2010 >   Raid Level : raid1 >   Array Size : 52436032 (50.01 GiB 53.69 GB) > ÂUsed Dev Size : 52436032 (50.01 GiB 53.69 GB) >  Raid Devices : 3 > ÂTotal Devices : 3 > Preferred Minor : 126 >  ÂPersistence : Superblock is persistent > >  ÂUpdate Time : Mon Apr 25 18:29:39 2011 >     ÂState : clean > ÂActive Devices : 3 > Working Devices : 3 > ÂFailed Devices : 0 > ÂSpare Devices : 0 > >      UUID : edb0ed65:6e87b20e:dc0d88ba:780ef6a3 >     Events : 0.248880 > >  ÂNumber  Major  Minor  RaidDevice State >    0    8    Â5    Â0   Âactive sync  /dev/sda5 >    1    8    21    Â1   Âactive sync  /dev/sdb5 >    2    8    37    Â2   Âactive sync  /dev/sdc5 > c2stable ~ # > > Thanks in advance. > > Cheers, > Mark > OK, I don't know exactly what I'm looking for a problem here. I ran the repair, then rebooted. Mismatch count was zero. It seemed the repair had worked. I then used the system for about 4 hours. After 4 hours I did another check and found the mismatch count had increased. What I need to get a handle on is: 1) Is this serious? (I assume yes) 2) How do I figure out which drive(s) of the 3 is having trouble? 3) If there is a specific drive, what is the process to swap it out? Thanks, Mark c2stable ~ # cat /sys/block/md126/md/mismatch_cnt 0 c2stable ~ # echo check >/sys/block/md126/md/sync_action c2stable ~ # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md6 : active raid1 sdb6[1] sdc6[2] sda6[0] 247416933 blocks super 1.1 [3/3] [UUU] md7 : active raid6 sdb7[1] sdc7[2] sde2[4] sda7[0] sdd2[3] 395387904 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU] md3 : active raid6 sdb3[1] sdc3[2] sda3[0] sdd3[3] sde3[4] 157305168 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU] md126 : active raid1 sdc5[2] sdb5[1] sda5[0] 52436032 blocks [3/3] [UUU] [>....................] check = 1.1% (626560/52436032) finish=11.0min speed=78320K/sec unused devices: <none> c2stable ~ # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md6 : active raid1 sdb6[1] sdc6[2] sda6[0] 247416933 blocks super 1.1 [3/3] [UUU] md7 : active raid6 sdb7[1] sdc7[2] sde2[4] sda7[0] sdd2[3] 395387904 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU] md3 : active raid6 sdb3[1] sdc3[2] sda3[0] sdd3[3] sde3[4] 157305168 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU] md126 : active raid1 sdc5[2] sdb5[1] sda5[0] 52436032 blocks [3/3] [UUU] [===========>.........] check = 59.6% (31291776/52436032) finish=5.5min speed=63887K/sec unused devices: <none> c2stable ~ # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md6 : active raid1 sdb6[1] sdc6[2] sda6[0] 247416933 blocks super 1.1 [3/3] [UUU] md7 : active raid6 sdb7[1] sdc7[2] sde2[4] sda7[0] sdd2[3] 395387904 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU] md3 : active raid6 sdb3[1] sdc3[2] sda3[0] sdd3[3] sde3[4] 157305168 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU] md126 : active raid1 sdc5[2] sdb5[1] sda5[0] 52436032 blocks [3/3] [UUU] unused devices: <none> c2stable ~ # cat /sys/block/md126/md/mismatch_cnt 7424 c2stable ~ # -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html