2011/12/11 NeilBrown <neilb@xxxxxxx>: > On Fri, 9 Dec 2011 08:53:42 -0500 Gavin Peters (蓋文彼德斯) <gavin@xxxxxx> > wrote: > >> I tried to reshape today, a raid6 array from seven devices up to >> eight. I ran mdadm 3.2.2, something like >> >> # mdadm /dev/md2 --grow -n 8 --layout=preserve >> >> and then, blammo! >> >> Dec 8 22:30:10 avclub kernel: [ 527.094708] RAID5 conf printout: >> Dec 8 22:30:10 avclub kernel: [ 527.094712] --- rd:8 wd:8 >> Dec 8 22:30:10 avclub kernel: [ 527.094714] disk 0, o:1, dev:sdc6 >> Dec 8 22:30:10 avclub kernel: [ 527.094715] disk 1, o:1, dev:sdf6 >> Dec 8 22:30:10 avclub kernel: [ 527.094717] disk 2, o:1, dev:sda6 >> Dec 8 22:30:10 avclub kernel: [ 527.094718] disk 3, o:1, dev:sdd6 >> Dec 8 22:30:10 avclub kernel: [ 527.094719] disk 4, o:1, dev:sdb6 >> Dec 8 22:30:10 avclub kernel: [ 527.094720] disk 5, o:1, dev:sde6 >> Dec 8 22:30:10 avclub kernel: [ 527.094721] disk 6, o:1, dev:sdg6 >> Dec 8 22:30:10 avclub kernel: [ 527.094722] disk 7, o:1, dev:sdh6 >> Dec 8 22:30:10 avclub kernel: [ 527.094876] md: reshape of RAID array md2 >> Dec 8 22:30:10 avclub kernel: [ 527.094886] md: minimum _guaranteed_ >> speed: 40000 KB/sec/disk. >> Dec 8 22:30:10 avclub kernel: [ 527.094892] md: using maximum >> available idle IO bandwidth (but not more than 200000 KB/sec) for >> reshape. >> Dec 8 22:30:10 avclub kernel: [ 527.094912] md: using 128k window, >> over a total of 1371476928 blocks. >> Dec 8 22:30:11 avclub mdadm[2959]: RebuildStarted event detected on >> md device /dev/md2 >> Dec 8 22:30:11 avclub kernel: [ 527.515359] general protection >> fault: 0000 [#1] SMP >> Dec 8 22:30:11 avclub kernel: [ 527.515370] last sysfs file: >> /sys/devices/virtual/block/md2/md/sync_speed >> Dec 8 22:30:11 avclub kernel: [ 527.515376] CPU 5 >> Dec 8 22:30:11 avclub kernel: [ 527.515381] Modules linked in: >> binfmt_misc nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc >> snd_usb_audio snd_usb_lib snd_hda_codec_atihdmi fbcon tileblit font >> bitblit softcursor >> vga16fb vgastate snd_hda_codec_via snd_hda_intel snd_pcm_oss >> snd_hda_codec snd_mixer_ >> oss snd_hwdep snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi >> snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device radeon >> ttm asus_atk0110 drm_kms_helper ppdev snd drm i2c_algo_bit parport_pc >> edac_core edac_mce_amd gspca_zc3xx gspca_main videodev v4l1_compat >> v4l2_compat_ioctl32 soundcore snd_page_alloc i2c_piix4 shpchp lp >> parport tcp_vegas raid10 raid456 async_pq async_xor xor async_memcpy >> usbhid async_raid6_recov hid raid6_pq async_tx raid1 raid0 pata_atiixp >> r8169 mii multipath ahci linear [last unloaded: kvm] >> Dec 8 22:30:11 avclub kernel: [ 527.515500] Pid: 528, comm: >> md2_raid6 Not tainted 2.6.32-32-generic #62-Ubuntu System Product Name >> Dec 8 22:30:11 avclub kernel: [ 527.515507] RIP: >> 0010:[<ffffffff812be15b>] [<ffffffff812be15b>] memcpy_c+0xb/0x20 >> Dec 8 22:30:11 avclub kernel: [ 527.515526] RSP: >> 0018:ffff880408985c18 EFLAGS: 00010246 >> Dec 8 22:30:11 avclub kernel: [ 527.515531] RAX: db73880000000000 >> RBX: ffff880408984000 RCX: 0000000000000200 >> Dec 8 22:30:11 avclub kernel: [ 527.515537] RDX: 0000000000000000 >> RSI: ffff880369717000 RDI: db73880000000000 >> Dec 8 22:30:11 avclub kernel: [ 527.515543] RBP: ffff880408985c80 >> R08: 0000000000001000 R09: ffff880408985ca0 >> Dec 8 22:30:11 avclub kernel: [ 527.515548] R10: 0000000000000000 >> R11: 0000000000000000 R12: ffff880408985ca0 >> Dec 8 22:30:11 avclub kernel: [ 527.515553] R13: ffff880369741290 >> R14: 0000000000000000 R15: 0000000000000000 >> Dec 8 22:30:11 avclub kernel: [ 527.515560] FS: >> 00007f465923d7a0(0000) GS:ffff880028340000(0000) >> knlGS:00000000f6990760 >> Dec 8 22:30:11 avclub kernel: [ 527.515566] CS: 0010 DS: 0018 ES: >> 0018 CR0: 000000008005003b >> Dec 8 22:30:11 avclub kernel: [ 527.515571] CR2: 00007fe6aaf92000 >> CR3: 00000003c3a5e000 CR4: 00000000000006e0 >> Dec 8 22:30:11 avclub kernel: [ 527.515576] DR0: 0000000000000000 >> DR1: 0000000000000000 DR2: 0000000000000000 >> Dec 8 22:30:11 avclub kernel: [ 527.515582] DR3: 0000000000000000 >> DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Dec 8 22:30:11 avclub kernel: [ 527.515589] Process md2_raid6 (pid: >> 528, threadinfo ffff880408984000, task ffff88040b210000) >> Dec 8 22:30:11 avclub kernel: [ 527.515593] Stack: >> Dec 8 22:30:11 avclub kernel: [ 527.515596] ffffffffa004a0e7 >> ffff880408985c50 0000000000000000 0000000000000000 >> Dec 8 22:30:11 avclub kernel: [ 527.515604] <0> ffffea000bf10d08 >> 0000000000000000 0000000000001000 ffff880408985c80 >> Dec 8 22:30:11 avclub kernel: [ 527.515614] <0> 0000000000000000 >> ffff8803696a6930 ffff880369741290 ffff880408985d70 >> Dec 8 22:30:11 avclub kernel: [ 527.515624] Call Trace: >> Dec 8 22:30:11 avclub kernel: [ 527.515639] [<ffffffffa004a0e7>] ? >> async_memcpy+0xe7/0x25c [async_memcpy] >> Dec 8 22:30:11 avclub kernel: [ 527.515654] [<ffffffffa00aaabb>] >> handle_stripe_expansion+0x14b/0x1e0 [raid456] >> Dec 8 22:30:11 avclub kernel: [ 527.515668] [<ffffffffa00ab113>] >> handle_stripe6+0x5c3/0xb40 [raid456] >> Dec 8 22:30:11 avclub kernel: [ 527.515680] [<ffffffffa00a794c>] ? >> __release_stripe+0xcc/0x1c0 [raid456] >> Dec 8 22:30:11 avclub kernel: [ 527.515692] [<ffffffffa00ac055>] >> handle_stripe+0x25/0x30 [raid456] >> Dec 8 22:30:11 avclub kernel: [ 527.515703] [<ffffffffa00ac452>] >> raid5d+0x202/0x320 [raid456] >> Dec 8 22:30:11 avclub kernel: [ 527.515716] [<ffffffff815416b9>] ? >> _spin_unlock_irqrestore+0x19/0x30 >> Dec 8 22:30:11 avclub kernel: [ 527.515725] [<ffffffff8141704c>] >> md_thread+0x5c/0x130 >> Dec 8 22:30:11 avclub kernel: [ 527.515735] [<ffffffff81084cb0>] ? >> autoremove_wake_function+0x0/0x40 >> Dec 8 22:30:11 avclub kernel: [ 527.515743] [<ffffffff81416ff0>] ? >> md_thread+0x0/0x130 >> Dec 8 22:30:11 avclub kernel: [ 527.515750] [<ffffffff81084936>] >> kthread+0x96/0xa0 >> Dec 8 22:30:11 avclub kernel: [ 527.515758] [<ffffffff810131ea>] >> child_rip+0xa/0x20 >> Dec 8 22:30:11 avclub kernel: [ 527.515766] [<ffffffff810848a0>] ? >> kthread+0x0/0xa0 >> Dec 8 22:30:11 avclub kernel: [ 527.515772] [<ffffffff810131e0>] ? >> child_rip+0x0/0x20 >> Dec 8 22:30:11 avclub kernel: [ 527.515776] Code: 81 ea d8 1f 00 00 >> 48 3b 42 20 73 07 48 8b 50 f9 31 c0 c3 31 d2 48 c7 c0 f2 ff ff ff c3 >> 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 66 >> 66 66 66 2e 0f 1f 84 00 00 00 00 00 >> Dec 8 22:30:11 avclub kernel: [ 527.515842] RIP >> [<ffffffff812be15b>] memcpy_c+0xb/0x20 >> Dec 8 22:30:11 avclub kernel: [ 527.515850] RSP <ffff880408985c18> >> Dec 8 22:30:11 avclub kernel: [ 527.515857] ---[ end trace >> 5146b1cc8ebe8dc1 ]--- >> Dec 8 22:30:11 avclub kernel: [ 527.515865] note: md2_raid6[528] >> exited with preempt_count 2 >> Dec 8 22:32:52 avclub kernel: Kernel logging (proc) stopped. >> >> I believe that last line shows me giving up. I am sad. >> Thankfully, after rebooting into single user mode, I was able to mdadm >> --assemble the array, and it appears to be working. Boy that was a >> rush! >> $ uname -aLinux avclub 2.6.32-32-generic #62-Ubuntu SMP Wed Apr 20 >> 21:52:38 UTC 2011 x86_64 GNU/Linux >> Let me know if I can provide any other information. >> > > Thanks for the report. > > It seems that as part of the reshape, md is trying to copy to an invalid > memory address. > It copies from 0xffff880369717000 (RSI) to 0xdb73880000000000 (rdi). > The latter is clearly invalid. > > I have no idea how this might be happening. My best guess is that 'ddidx' in > handle_stripe_expansion is getting a bad value but I cannot see how that > would happen. > > If you have reasonable backups you could try again and see if it still fails. > Maybe it was a one-off. That appears to be the case. After I rebooted, I was able to assemble the raid, and it continued the sync with no apparent loss of data. Naturally, during that resync, I lost a drive, which I'm now doing a series of resyncs to recover from (the drive reappeared on reboot, so I can't quite figure out what's up with it).... Hopefully, sometime in the next day or two I'll have a computer that works, hasn't lost data, and is not running a RAID resync! If I can be of any help debugging the problem I had, let me know. Otherwise, I hope you don't hear from me again. :-D - Gavin > > Not sure what else to suggest. It might be fixed in a newer kernel, or it > might not... > > NeilBrown > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html