On Mon, Dec 12, 2011 at 5:01 AM, NeilBrown <neilb@xxxxxxx> wrote: > On Sun, 11 Dec 2011 09:03:14 +0200 Anssi Hannula <anssi.hannula@xxxxxx> wrote: > >> Hi! >> >> After I rebooted during a raid6 rebuild, the rebuild didn't start again. >> Instead, there is a flood of "RAID conf printout"s that seemingly happen >> on array activity. >> >> All the devices show up properly in --detail and two devices are marked >> as "spare rebuilding", and I can access the contents of the array just >> fine, but the rebuild doesn't actually start. Is this a bug or am I >> missing something? :) >> >> I was initially on 2.6.38.8, but also tried 3.1.4 which seems to have >> the same issue. mdadm is 3.1.5. >> >> I'm not using start_ro and writing to the array doesn't trigger a >> rebuild either. >> >> Attached are --examine outputs before assembly, kernel log output on >> assembly, /proc/mdstat and --detail after assembly (on 3.1.4). >> > > Thank you for the very detailed problem report. Thanks for the quick response :) > Unfortunately it is a complete mystery to me what is happening. > > The repeated "RAID conf printout" messages are almost certainly coming from > the end of raid5_remove_disk. > It is being called from remove_and_add_spares for each of the two devices > that are being rebuilt. raid5_remove_disk declines to remove them because it > can keep rebuilding them. > > remove_and_add_spares then counts them and notes there are 2. > md_check_recovery notes that this is > 0, so it should create a thread to run > md_do_sync. > > md_do_sync should then print out a message like > md: recovery of RAID array md0 > > but it doesn't. So something went wrong. > There are three reasons that md_do_sync might not print a message: > > 1/ MD_RECOVERY_DONE is set. As only md_do_sync ever sets it, that is > unlikely, and in any case md_check_recovery clears it. > 2/ mddev->ro != 0. It is only ever set to 0, 1, or 2. If it is 1 or 2 > then we would be able to see that in /proc/mdstat as a "(readonly)" > status. But we don't. > 3/ MD_RECOVERY_INTR is set. Again, md_check_recovery clears this. It does > get set if kthread_should_stop() returns 'true', but that should only > happen if kthread_stop() was called. That is only called by > md_unregister_thread and I cannot see any way that could be call. > > So. No idea. > > Are you compiling these kernels yourself? Nope (used Mageia kernels), but I did now (3.1.5). > If so, could you: > - put a printk in the top of md_do_sync to report the values of > mddev->recovery and mddev->ro > - print a message whenever md_unregister_thread is called > - in md_check_recovery, in the > if (mddev->ro) { > /* Only thing we do on a ro array is remove > * failed devices. > */ > mdk_rdev_t *rdev; > > in statement, print the value of mddev->ro. > > Then see which of those printk's fire, and what they tell us. Only the last one does, and mddev->ro == 0. For reference, attached is the used patch and resulting log output. -- Anssi Hannula
[ 46.356559] md: md0 stopped. [ 46.363773] md: bind<sdq1> [ 46.363920] md: bind<sdk1> [ 46.364035] md: bind<sdh1> [ 46.364170] md: bind<sdj1> [ 46.364567] md: bind<sdl1> [ 46.364972] md: bind<sdp1> [ 46.365184] md: bind<sdf1> [ 46.365603] md: bind<sdm1> [ 46.366025] md: bind<sdn1> [ 46.366219] md: bind<sdi1> [ 46.366464] md: bind<sdg1> [ 46.366751] md: bind<sde1> [ 46.367222] md: bind<sdo1> [ 46.374687] async_tx: api initialized (async) [ 46.547228] raid6: int64x1 2723 MB/s [ 46.717043] raid6: int64x2 3533 MB/s [ 46.886928] raid6: int64x4 2398 MB/s [ 47.056795] raid6: int64x8 2482 MB/s [ 47.226617] raid6: sse2x1 7368 MB/s [ 47.396465] raid6: sse2x2 8686 MB/s [ 47.566333] raid6: sse2x4 9614 MB/s [ 47.566336] raid6: using algorithm sse2x4 (9614 MB/s) [ 47.573931] xor: automatically using best checksumming function: generic_sse [ 47.616567] generic_sse: 12849.600 MB/sec [ 47.616569] xor: using function: generic_sse (12849.600 MB/sec) [ 47.625852] md: raid6 personality registered for level 6 [ 47.625854] md: raid5 personality registered for level 5 [ 47.625855] md: raid4 personality registered for level 4 [ 47.626275] md/raid:md0: device sdo1 operational as raid disk 0 [ 47.626277] md/raid:md0: device sdn1 operational as raid disk 9 [ 47.626279] md/raid:md0: device sdm1 operational as raid disk 8 [ 47.626280] md/raid:md0: device sdp1 operational as raid disk 6 [ 47.626281] md/raid:md0: device sdl1 operational as raid disk 5 [ 47.626283] md/raid:md0: device sdj1 operational as raid disk 4 [ 47.626284] md/raid:md0: device sdk1 operational as raid disk 2 [ 47.626285] md/raid:md0: device sdq1 operational as raid disk 1 [ 47.626899] md/raid:md0: allocated 10592kB [ 47.627049] md/raid:md0: raid level 6 active with 8 out of 10 devices, algorithm 2 [ 47.627150] RAID conf printout: [ 47.627152] --- level:6 rd:10 wd:8 [ 47.627155] disk 0, o:1, dev:sdo1 [ 47.627157] disk 1, o:1, dev:sdq1 [ 47.627163] disk 2, o:1, dev:sdk1 [ 47.627165] disk 3, o:1, dev:sdh1 [ 47.627167] disk 4, o:1, dev:sdj1 [ 47.627169] disk 5, o:1, dev:sdl1 [ 47.627171] disk 6, o:1, dev:sdp1 [ 47.627173] disk 7, o:1, dev:sdf1 [ 47.627175] disk 8, o:1, dev:sdm1 [ 47.627177] disk 9, o:1, dev:sdn1 [ 47.627337] created bitmap (15 pages) for device md0 [ 48.028349] md0: bitmap initialized from disk: read 1/1 pages, set 257 of 29809 bits [ 48.476099] md0: detected capacity change from 0 to 16003169779712 [ 48.476739] md0: in md_check_recovery(), mddev->ro = 0 [ 48.476742] RAID conf printout: [ 48.476744] --- level:6 rd:10 wd:8 [ 48.476746] disk 0, o:1, dev:sdo1 [ 48.476748] disk 1, o:1, dev:sdq1 [ 48.476749] disk 2, o:1, dev:sdk1 [ 48.476751] disk 3, o:1, dev:sdh1 [ 48.476753] disk 4, o:1, dev:sdj1 [ 48.476754] disk 5, o:1, dev:sdl1 [ 48.476756] disk 6, o:1, dev:sdp1 [ 48.476757] disk 7, o:1, dev:sdf1 [ 48.476758] disk 8, o:1, dev:sdm1 [ 48.476760] disk 9, o:1, dev:sdn1 [ 48.476761] RAID conf printout: [ 48.476762] --- level:6 rd:10 wd:8 [ 48.476763] disk 0, o:1, dev:sdo1 [ 48.476764] disk 1, o:1, dev:sdq1 [ 48.476766] disk 2, o:1, dev:sdk1 [ 48.476767] disk 3, o:1, dev:sdh1 [ 48.476768] disk 4, o:1, dev:sdj1 [ 48.476770] disk 5, o:1, dev:sdl1 [ 48.476771] disk 6, o:1, dev:sdp1 [ 48.476772] disk 7, o:1, dev:sdf1 [ 48.476773] disk 8, o:1, dev:sdm1 [ 48.476775] disk 9, o:1, dev:sdn1 [ 48.476775] RAID conf printout: [ 48.476777] --- level:6 rd:10 wd:8 [ 48.476778] disk 0, o:1, dev:sdo1 [ 48.476779] disk 1, o:1, dev:sdq1 [ 48.476780] disk 2, o:1, dev:sdk1 [ 48.476781] disk 3, o:1, dev:sdh1 [ 48.476783] disk 4, o:1, dev:sdj1 [ 48.476784] disk 5, o:1, dev:sdl1 [ 48.476785] disk 6, o:1, dev:sdp1 [ 48.476786] disk 7, o:1, dev:sdf1 [ 48.476788] disk 8, o:1, dev:sdm1 [ 48.476789] disk 9, o:1, dev:sdn1 [ 48.476790] RAID conf printout: [ 48.476791] --- level:6 rd:10 wd:8 [ 48.476792] disk 0, o:1, dev:sdo1 [ 48.476793] disk 1, o:1, dev:sdq1 [ 48.476795] disk 2, o:1, dev:sdk1 [ 48.476796] disk 3, o:1, dev:sdh1 [ 48.476798] disk 4, o:1, dev:sdj1 [ 48.476799] disk 5, o:1, dev:sdl1 [ 48.476800] disk 6, o:1, dev:sdp1 [ 48.476802] disk 7, o:1, dev:sdf1 [ 48.476803] disk 8, o:1, dev:sdm1 [ 48.476804] disk 9, o:1, dev:sdn1 [ 48.476805] RAID conf printout: [ 48.476806] --- level:6 rd:10 wd:8 [ 48.476807] disk 0, o:1, dev:sdo1 [ 48.476809] disk 1, o:1, dev:sdq1 [ 48.476810] disk 2, o:1, dev:sdk1 [ 48.476811] disk 3, o:1, dev:sdh1 [ 48.476812] disk 4, o:1, dev:sdj1 [ 48.476813] disk 5, o:1, dev:sdl1 [ 48.476815] disk 6, o:1, dev:sdp1 [ 48.476816] disk 7, o:1, dev:sdf1 [ 48.476817] disk 8, o:1, dev:sdm1 [ 48.476819] disk 9, o:1, dev:sdn1 [ 48.488210] md0: unknown partition table [ 49.328595] md1: in md_check_recovery(), mddev->ro = 0 [ 49.329692] EXT4-fs (dm-0): re-mounted. Opts: (null) [ 49.534829] md1: in md_check_recovery(), mddev->ro = 0 [ 49.792196] md0: in md_check_recovery(), mddev->ro = 0 [ 50.349944] RAID conf printout: [ 50.349949] --- level:6 rd:10 wd:8 [ 50.349952] disk 0, o:1, dev:sdo1 [ 50.349955] disk 1, o:1, dev:sdq1 [ 50.349957] disk 2, o:1, dev:sdk1 [ 50.349959] disk 3, o:1, dev:sdh1 [ 50.349961] disk 4, o:1, dev:sdj1 [ 50.349964] disk 5, o:1, dev:sdl1 [ 50.349966] disk 6, o:1, dev:sdp1 [ 50.349968] disk 7, o:1, dev:sdf1 [ 50.349970] disk 8, o:1, dev:sdm1 [ 50.349972] disk 9, o:1, dev:sdn1 [ 50.349974] RAID conf printout: [ 50.349976] --- level:6 rd:10 wd:8 [ 50.349978] disk 0, o:1, dev:sdo1 [ 50.349980] disk 1, o:1, dev:sdq1 [ 50.349982] disk 2, o:1, dev:sdk1 [ 50.349984] disk 3, o:1, dev:sdh1 [ 50.349986] disk 4, o:1, dev:sdj1 [ 50.349988] disk 5, o:1, dev:sdl1 [ 50.349990] disk 6, o:1, dev:sdp1 [ 50.349992] disk 7, o:1, dev:sdf1 [ 50.349994] disk 8, o:1, dev:sdm1 [ 50.349996] disk 9, o:1, dev:sdn1 [ 50.349997] RAID conf printout: [ 50.349999] --- level:6 rd:10 wd:8 [ 50.350001] disk 0, o:1, dev:sdo1 [ 50.350003] disk 1, o:1, dev:sdq1 [ 50.350005] disk 2, o:1, dev:sdk1 [ 50.350007] disk 3, o:1, dev:sdh1 [ 50.350009] disk 4, o:1, dev:sdj1 [ 50.350011] disk 5, o:1, dev:sdl1 [ 50.350013] disk 6, o:1, dev:sdp1 [ 50.350015] disk 7, o:1, dev:sdf1 [ 50.350017] disk 8, o:1, dev:sdm1 [ 50.350019] disk 9, o:1, dev:sdn1 [ 50.350020] RAID conf printout: [ 50.350022] --- level:6 rd:10 wd:8 [ 50.350024] disk 0, o:1, dev:sdo1 [ 50.350026] disk 1, o:1, dev:sdq1 [ 50.350028] disk 2, o:1, dev:sdk1 [ 50.350030] disk 3, o:1, dev:sdh1 [ 50.350032] disk 4, o:1, dev:sdj1 [ 50.350034] disk 5, o:1, dev:sdl1 [ 50.350036] disk 6, o:1, dev:sdp1 [ 50.350038] disk 7, o:1, dev:sdf1 [ 50.350040] disk 8, o:1, dev:sdm1 [ 50.350042] disk 9, o:1, dev:sdn1 [ 50.350043] RAID conf printout: [ 50.350045] --- level:6 rd:10 wd:8 [ 50.350047] disk 0, o:1, dev:sdo1 [ 50.350049] disk 1, o:1, dev:sdq1 [ 50.350051] disk 2, o:1, dev:sdk1 [ 50.350053] disk 3, o:1, dev:sdh1 [ 50.350055] disk 4, o:1, dev:sdj1 [ 50.350057] disk 5, o:1, dev:sdl1 [ 50.350059] disk 6, o:1, dev:sdp1 [ 50.350061] disk 7, o:1, dev:sdf1 [ 50.350063] disk 8, o:1, dev:sdm1 [ 50.350065] disk 9, o:1, dev:sdn1 [ 50.941625] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null) [ 50.966825] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: (null) [ 50.979025] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null) [ 50.991736] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) [ 51.006856] loop: module loaded [ 51.033690] md1: in md_check_recovery(), mddev->ro = 0
--- md-orig/md.c 2011-10-24 10:10:05.000000000 +0300 +++ 3.1.4-MD/md.c 2011-12-12 06:26:43.388401596 +0200 @@ -6442,6 +6442,7 @@ mdk_thread_t *md_register_thread(void (* void md_unregister_thread(mdk_thread_t **threadp) { mdk_thread_t *thread = *threadp; + printk(KERN_INFO "md_unregister_thread()\n"); if (!thread) return; dprintk("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk)); @@ -6996,6 +6997,8 @@ void md_do_sync(mddev_t *mddev) mdk_rdev_t *rdev; char *desc; + printk(KERN_INFO "%s: md_do_sync(), recovery=0x%lx, ro=%d\n", + mdname(mddev), mddev->recovery, mddev->ro); /* just incase thread restarts... */ if (test_bit(MD_RECOVERY_DONE, &mddev->recovery)) return; @@ -7445,6 +7452,8 @@ void md_check_recovery(mddev_t *mddev) if (mddev_trylock(mddev)) { int spares = 0; + printk(KERN_INFO "%s: in md_check_recovery(), mddev->ro = %d\n", + mdname(mddev), mddev->ro); if (mddev->ro) { /* Only thing we do on a ro array is remove * failed devices.