Re: failed reshape!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2011/12/11 NeilBrown <neilb@xxxxxxx>:
> On Fri, 9 Dec 2011 08:53:42 -0500 Gavin  Peters (蓋文彼德斯) <gavin@xxxxxx>
> wrote:
>
>> I tried to reshape today, a raid6 array from seven devices up to
>> eight.  I ran mdadm 3.2.2, something like
>>
>> # mdadm /dev/md2 --grow -n 8 --layout=preserve
>>
>> and then, blammo!
>>
>> Dec  8 22:30:10 avclub kernel: [  527.094708] RAID5 conf printout:
>> Dec  8 22:30:10 avclub kernel: [  527.094712]  --- rd:8 wd:8
>> Dec  8 22:30:10 avclub kernel: [  527.094714]  disk 0, o:1, dev:sdc6
>> Dec  8 22:30:10 avclub kernel: [  527.094715]  disk 1, o:1, dev:sdf6
>> Dec  8 22:30:10 avclub kernel: [  527.094717]  disk 2, o:1, dev:sda6
>> Dec  8 22:30:10 avclub kernel: [  527.094718]  disk 3, o:1, dev:sdd6
>> Dec  8 22:30:10 avclub kernel: [  527.094719]  disk 4, o:1, dev:sdb6
>> Dec  8 22:30:10 avclub kernel: [  527.094720]  disk 5, o:1, dev:sde6
>> Dec  8 22:30:10 avclub kernel: [  527.094721]  disk 6, o:1, dev:sdg6
>> Dec  8 22:30:10 avclub kernel: [  527.094722]  disk 7, o:1, dev:sdh6
>> Dec  8 22:30:10 avclub kernel: [  527.094876] md: reshape of RAID array md2
>> Dec  8 22:30:10 avclub kernel: [  527.094886] md: minimum _guaranteed_
>>  speed: 40000 KB/sec/disk.
>> Dec  8 22:30:10 avclub kernel: [  527.094892] md: using maximum
>> available idle IO bandwidth (but not more than 200000 KB/sec) for
>> reshape.
>> Dec  8 22:30:10 avclub kernel: [  527.094912] md: using 128k window,
>> over a total of 1371476928 blocks.
>> Dec  8 22:30:11 avclub mdadm[2959]: RebuildStarted event detected on
>> md device /dev/md2
>> Dec  8 22:30:11 avclub kernel: [  527.515359] general protection
>> fault: 0000 [#1] SMP
>> Dec  8 22:30:11 avclub kernel: [  527.515370] last sysfs file:
>> /sys/devices/virtual/block/md2/md/sync_speed
>> Dec  8 22:30:11 avclub kernel: [  527.515376] CPU 5
>> Dec  8 22:30:11 avclub kernel: [  527.515381] Modules linked in:
>> binfmt_misc nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc
>> snd_usb_audio snd_usb_lib snd_hda_codec_atihdmi fbcon tileblit font
>> bitblit softcursor
>>  vga16fb vgastate snd_hda_codec_via snd_hda_intel snd_pcm_oss
>> snd_hda_codec snd_mixer_
>> oss snd_hwdep snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi
>> snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device radeon
>> ttm asus_atk0110 drm_kms_helper ppdev snd drm i2c_algo_bit parport_pc
>> edac_core edac_mce_amd gspca_zc3xx gspca_main videodev v4l1_compat
>> v4l2_compat_ioctl32 soundcore snd_page_alloc i2c_piix4 shpchp lp
>> parport tcp_vegas raid10 raid456 async_pq async_xor xor async_memcpy
>> usbhid async_raid6_recov hid raid6_pq async_tx raid1 raid0 pata_atiixp
>> r8169 mii multipath ahci linear [last unloaded: kvm]
>> Dec  8 22:30:11 avclub kernel: [  527.515500] Pid: 528, comm:
>> md2_raid6 Not tainted 2.6.32-32-generic #62-Ubuntu System Product Name
>> Dec  8 22:30:11 avclub kernel: [  527.515507] RIP:
>> 0010:[<ffffffff812be15b>]  [<ffffffff812be15b>] memcpy_c+0xb/0x20
>> Dec  8 22:30:11 avclub kernel: [  527.515526] RSP:
>> 0018:ffff880408985c18  EFLAGS: 00010246
>> Dec  8 22:30:11 avclub kernel: [  527.515531] RAX: db73880000000000
>> RBX: ffff880408984000 RCX: 0000000000000200
>> Dec  8 22:30:11 avclub kernel: [  527.515537] RDX: 0000000000000000
>> RSI: ffff880369717000 RDI: db73880000000000
>> Dec  8 22:30:11 avclub kernel: [  527.515543] RBP: ffff880408985c80
>> R08: 0000000000001000 R09: ffff880408985ca0
>> Dec  8 22:30:11 avclub kernel: [  527.515548] R10: 0000000000000000
>> R11: 0000000000000000 R12: ffff880408985ca0
>> Dec  8 22:30:11 avclub kernel: [  527.515553] R13: ffff880369741290
>> R14: 0000000000000000 R15: 0000000000000000
>> Dec  8 22:30:11 avclub kernel: [  527.515560] FS:
>> 00007f465923d7a0(0000) GS:ffff880028340000(0000)
>> knlGS:00000000f6990760
>> Dec  8 22:30:11 avclub kernel: [  527.515566] CS:  0010 DS: 0018 ES:
>> 0018 CR0: 000000008005003b
>> Dec  8 22:30:11 avclub kernel: [  527.515571] CR2: 00007fe6aaf92000
>> CR3: 00000003c3a5e000 CR4: 00000000000006e0
>> Dec  8 22:30:11 avclub kernel: [  527.515576] DR0: 0000000000000000
>> DR1: 0000000000000000 DR2: 0000000000000000
>> Dec  8 22:30:11 avclub kernel: [  527.515582] DR3: 0000000000000000
>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Dec  8 22:30:11 avclub kernel: [  527.515589] Process md2_raid6 (pid:
>> 528, threadinfo ffff880408984000, task ffff88040b210000)
>> Dec  8 22:30:11 avclub kernel: [  527.515593] Stack:
>> Dec  8 22:30:11 avclub kernel: [  527.515596]  ffffffffa004a0e7
>> ffff880408985c50 0000000000000000 0000000000000000
>> Dec  8 22:30:11 avclub kernel: [  527.515604] <0> ffffea000bf10d08
>> 0000000000000000 0000000000001000 ffff880408985c80
>> Dec  8 22:30:11 avclub kernel: [  527.515614] <0> 0000000000000000
>> ffff8803696a6930 ffff880369741290 ffff880408985d70
>> Dec  8 22:30:11 avclub kernel: [  527.515624] Call Trace:
>> Dec  8 22:30:11 avclub kernel: [  527.515639]  [<ffffffffa004a0e7>] ?
>> async_memcpy+0xe7/0x25c [async_memcpy]
>> Dec  8 22:30:11 avclub kernel: [  527.515654]  [<ffffffffa00aaabb>]
>> handle_stripe_expansion+0x14b/0x1e0 [raid456]
>> Dec  8 22:30:11 avclub kernel: [  527.515668]  [<ffffffffa00ab113>]
>> handle_stripe6+0x5c3/0xb40 [raid456]
>> Dec  8 22:30:11 avclub kernel: [  527.515680]  [<ffffffffa00a794c>] ?
>> __release_stripe+0xcc/0x1c0 [raid456]
>> Dec  8 22:30:11 avclub kernel: [  527.515692]  [<ffffffffa00ac055>]
>> handle_stripe+0x25/0x30 [raid456]
>> Dec  8 22:30:11 avclub kernel: [  527.515703]  [<ffffffffa00ac452>]
>> raid5d+0x202/0x320 [raid456]
>> Dec  8 22:30:11 avclub kernel: [  527.515716]  [<ffffffff815416b9>] ?
>> _spin_unlock_irqrestore+0x19/0x30
>> Dec  8 22:30:11 avclub kernel: [  527.515725]  [<ffffffff8141704c>]
>> md_thread+0x5c/0x130
>> Dec  8 22:30:11 avclub kernel: [  527.515735]  [<ffffffff81084cb0>] ?
>> autoremove_wake_function+0x0/0x40
>> Dec  8 22:30:11 avclub kernel: [  527.515743]  [<ffffffff81416ff0>] ?
>> md_thread+0x0/0x130
>> Dec  8 22:30:11 avclub kernel: [  527.515750]  [<ffffffff81084936>]
>> kthread+0x96/0xa0
>> Dec  8 22:30:11 avclub kernel: [  527.515758]  [<ffffffff810131ea>]
>> child_rip+0xa/0x20
>> Dec  8 22:30:11 avclub kernel: [  527.515766]  [<ffffffff810848a0>] ?
>> kthread+0x0/0xa0
>> Dec  8 22:30:11 avclub kernel: [  527.515772]  [<ffffffff810131e0>] ?
>> child_rip+0x0/0x20
>> Dec  8 22:30:11 avclub kernel: [  527.515776] Code: 81 ea d8 1f 00 00
>> 48 3b 42 20 73 07 48 8b 50 f9 31 c0 c3 31 d2 48 c7 c0 f2 ff ff ff c3
>> 90 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 66
>> 66 66 66 2e 0f 1f 84 00 00 00 00 00
>> Dec  8 22:30:11 avclub kernel: [  527.515842] RIP
>> [<ffffffff812be15b>] memcpy_c+0xb/0x20
>> Dec  8 22:30:11 avclub kernel: [  527.515850]  RSP <ffff880408985c18>
>> Dec  8 22:30:11 avclub kernel: [  527.515857] ---[ end trace
>> 5146b1cc8ebe8dc1 ]---
>> Dec  8 22:30:11 avclub kernel: [  527.515865] note: md2_raid6[528]
>> exited with preempt_count 2
>> Dec  8 22:32:52 avclub kernel: Kernel logging (proc) stopped.
>>
>> I believe that last line shows me giving up.  I am sad.
>> Thankfully, after rebooting into single user mode, I was able to mdadm
>> --assemble the array, and it appears to be working.  Boy that was a
>> rush!
>> $ uname -aLinux avclub 2.6.32-32-generic #62-Ubuntu SMP Wed Apr 20
>> 21:52:38 UTC 2011 x86_64 GNU/Linux
>> Let me know if I can provide any other information.
>>
>
> Thanks for the report.
>
> It seems that as part of the reshape, md is trying to copy to an invalid
> memory address.
> It copies from 0xffff880369717000 (RSI) to 0xdb73880000000000 (rdi).
> The latter is clearly invalid.
>
> I have no idea how this might be happening. My best guess is that 'ddidx' in
> handle_stripe_expansion is getting a bad value but I cannot see how that
> would happen.
>
> If you have reasonable backups you could  try again and see if it still fails.
> Maybe it was a one-off.

That appears to be the case.  After I rebooted, I was able to assemble
the raid, and it continued the sync with no apparent loss of data.
Naturally, during that resync, I lost a drive, which I'm now doing a
series of resyncs to recover from (the drive reappeared on reboot, so
I can't quite figure out what's up with it)....

Hopefully, sometime in the next day or two I'll have a computer that
works, hasn't lost data, and is not running a RAID resync!

If I can be of any help debugging the problem I had, let me know.
Otherwise, I hope you don't hear from me again.  :-D

- Gavin

>
> Not sure what else to suggest.  It might be fixed in a newer kernel, or it
> might not...
>
> NeilBrown
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux