On Mon, Apr 27, 2015 at 1:59 AM, NeilBrown <neilb@xxxxxxx> wrote: > On Sun, 26 Apr 2015 20:56:58 -0500 David Wahler <dwahler@xxxxxxxxx> wrote: > >> [oops, forgot to cc the list] >> >> On Sun, Apr 26, 2015 at 8:20 PM, NeilBrown <neilb@xxxxxxx> wrote: >> > Anyway: >> > >> > Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present. >> > >> > that is the only thing that looks at all interesting. Particularly the last >> > 3 words. >> > What does >> > mdadm --examine-badblocks /dev/sd[cde]1 >> > show? >> >> root@ceres:~# mdadm --examine-badblocks /dev/sd[cde]1 >> Bad-blocks on /dev/sdc1: >> 3699640928 for 32 sectors >> Bad-blocks on /dev/sdd1: >> 3699640928 for 32 sectors >> Bad-blocks on /dev/sde1: >> 3699640928 for 32 sectors >> >> Hmm, that seems kind of odd to me. For what it's worth, all four >> drives passed a SMART self-test, and "dd > /dev/null" completed >> without errors on all of them. I just read about the "badblocks" tool >> and I'm running it now. > > The array is reshaping a RAID6 from 4->5 devices, so that is 2 data disks to > 3 data disks. > > Reshape pos'n : 5548735488 (5291.69 GiB 5681.91 GB) > > so it is about 5.3TB through the array, so it has read about 2.6TB from the > devices and written about 1.7TB to the devices. > > 3699640928 sectors is about 1.8TB. That seems a little too close to be a co-incidence. > > Maybe when reshape write to somewhere that is a bad-block, it gets confused. > > On the other hand, when it copies from the 1.8TB address to the 1.2TB address > it should have record that it was bad-blocks that were being copied. It > doesn't seem like it did. > > I'll have to look at the code and try to figure out what is happening. > > I don't think there is anything useful you can do in the mean time... > > NeilBrown > Thanks, I appreciate you taking the time to look at this. GDB tells me that the address where the reshape is getting stuck corresponds to drivers/md/raid5.c:4933. I also noticed that there's a kernel thread "md127_raid6" occupying 100% of a CPU core. Here's a representative stack for that thread (with line numbers added by me): [<ffffffffa06aeb20>] raid_run_ops+0xbe0/0x1170 [raid456] arch/x86/include/asm/bitops.h:311 [<ffffffffa06ab277>] ops_run_io+0x27/0x970 [raid456] drivers/md/raid5.c:742 [<ffffffffa06a9940>] ops_complete_reconstruct+0x0/0x1c0 [raid456] drivers/md/raid5.c:1433 [<ffffffffa06b2a00>] handle_stripe+0xbd0/0x2650 [raid456] drivers/md/raid5.c:4099 [<ffffffff810a3162>] pick_next_task_fair+0x652/0x850 kernel/sched/fair.c:5047 [<ffffffffa06b463d>] handle_active_stripes.isra.41+0x1bd/0x4b0 [raid456] drivers/md/raid5.c:5248 [<ffffffff810115eb>] __switch_to+0x14b/0x5d0 arch/x86/include/asm/paravirt.h:276 [<ffffffffa06b4d7c>] raid5d+0x44c/0x5b0 [raid456] drivers/md/raid5.c:5340 [<ffffffffa0760c56>] md_thread+0x126/0x130 [md_mod] [<ffffffff810a86f0>] autoremove_wake_function+0x0/0x30 [<ffffffffa0760b30>] md_thread+0x0/0x130 [md_mod] [<ffffffff81089403>] kthread+0xd3/0xf0 [<ffffffff81089330>] kthread+0x0/0xf0 [<ffffffff81560598>] ret_from_fork+0x58/0x90 [<ffffffff81089330>] kthread+0x0/0xf0 [<ffffffffffffffff>] 0xffffffffffffffff And the output of running "perf record" for a few seconds followed by "perf report": # ======== # captured on: Mon Apr 27 12:07:01 2015 # hostname : ceres # os release : 4.0.0 # perf version : 4.0.g39a880 # arch : x86_64 # nrcpus online : 4 # nrcpus avail : 4 # cpudesc : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz # cpuid : GenuineIntel,6,60,3 # total memory : 16112348 kB # cmdline : /home/david/software/linux/tools/perf/perf record -p 508 # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, attr_mmap2 = 1, attr_mmap = 1, attr_mmap_data = 0 # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # pmu mappings: cpu = 4, software = 1, power = 6, uncore_imc = 7, tracepoint = 2, breakpoint = 5 # ======== # # Samples: 84K of event 'cycles' # Event count (approx.): 76131836770 # # Overhead Command Shared Object Symbol # ........ ........... ................. ........................................... # 41.90% md127_raid6 [raid6_pq] [k] raid6_avx24_gen_syndrome 19.31% md127_raid6 [raid456] [k] analyse_stripe 8.48% md127_raid6 [raid456] [k] ops_run_i 4.05% md127_raid6 [raid456] [k] handle_stripe 3.72% md127_raid6 [raid456] [k] ops_complete_reconstruct 3.30% md127_raid6 [kernel.kallsyms] [k] __kernel_fpu_end 3.07% md127_raid6 [raid456] [k] schedule_reconstruction 3.01% md127_raid6 [raid456] [k] raid5_compute_sector 1.42% md127_raid6 [raid456] [k] handle_active_stripes.isra.41 1.33% md127_raid6 [kernel.kallsyms] [k] __kernel_fpu_begin 1.33% md127_raid6 [raid456] [k] do_release_stripe 1.16% md127_raid6 [raid456] [k] raid_run_ops 1.14% md127_raid6 [md_mod] [k] md_is_badblock 1.12% md127_raid6 [raid456] [k] set_syndrome_sources 1.07% md127_raid6 [kernel.kallsyms] [k] _raw_spin_lock_irqsave 0.93% md127_raid6 [async_pq] [k] async_gen_syndrome 0.69% md127_raid6 [raid456] [k] release_stripe 0.64% md127_raid6 [md_mod] [k] md_wakeup_thread 0.54% md127_raid6 [raid456] [k] __release_stripe 0.40% md127_raid6 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 0.22% md127_raid6 [raid456] [k] release_stripe_list 0.20% md127_raid6 [kernel.kallsyms] [k] __wake_up_common 0.17% md127_raid6 [raid456] [k] release_inactive_stripe_list 0.16% md127_raid6 [kernel.kallsyms] [k] __wake_up 0.13% md127_raid6 [raid456] [k] raid5d 0.11% md127_raid6 [kernel.kallsyms] [k] _raw_spin_lock_irq 0.10% md127_raid6 [raid456] [k] return_io [...etc...] I don't urgently need this array up and running, so I'm happy to leave it in its current state for the next few days in case there's anything else I can do to help track this down. -- David -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html