Re: power outage while raid5->raid6 was in progress

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 08, 2010 at 11:21:44AM +1000, Neil Brown wrote:
> On Thu, 8 Jul 2010 01:13:16 +0200
> Sebastian Reichel <elektranox@xxxxxxxxx> wrote:
> 
> > On Thu, Jul 08, 2010 at 08:44:50AM +1000, Neil Brown wrote:
> > > On Wed, 7 Jul 2010 22:41:10 +0200
> > > Sebastian Reichel <elektranox@xxxxxxxxx> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > I have some problems with my raid. I tried updating from 5 disks raid5 to 8 disks
> > > > raid6 as described on http://neil.brown.name/blog/20090817000931#2. The command I
> > > > used was: mdadm --grow /dev/md0 --level=6 --raid-disk=8
> > > > 
> > > > While the rebuild was in progress my system hung, so I had to force power down it.
> > > > After rebooting the system I reassembled the raid. You can see the resulting mess
> > > > below. How can I recover from this state?
> > > 
> > > Please report the output of
> > > 
> > >   mdadm -E /dev/sd[efghijkl]1
> > > 
> > > then I'll see what can be done.
> > 
> > thank you for having a look at it :)
> 
> 
> It appears that the RAID5 -> RAID6 conversion (which is instantaneous, but
> results in a non-standard RAID6 parity layout) happened, but the 
> 6disk -> 8disk reshape which would have been combined with producing a
> more standard RAID6 parity layout did not even begin.
> I don't know why that would be.  Do you remember seeing the reshape being
> under-way in /proc/mdstat at all??
>
> If you didn't them I am very confused and the following is not at all
> reliable.  If you didn't and you only assumed a reshape was happening, then
> read on.
> 
> [...]
> 
> But if you are sure the reshape actually started the first time, don't do
> any of this.  Rather try to find some earlier kernel logs that show the
> reshape starting, and maybe show what caused the crash.

Yes, I saw the reshape via /proc/mdstat, it already passed 30% iirc.
Here are the interesting parts of the kernel log:

Jul  6 21:09:14 mars kernel: [ 7716.935683] RAID5 conf printout:
Jul  6 21:09:14 mars kernel: [ 7716.935688]  --- rd:5 wd:5
Jul  6 21:09:14 mars kernel: [ 7716.935691]  disk 0, o:1, dev:sdk1
Jul  6 21:09:14 mars kernel: [ 7716.935693]  disk 1, o:1, dev:sde1
Jul  6 21:09:14 mars kernel: [ 7716.935695]  disk 2, o:1, dev:sdl1
Jul  6 21:09:14 mars kernel: [ 7716.935696]  disk 3, o:1, dev:sdj1
Jul  6 21:09:14 mars kernel: [ 7716.935698]  disk 4, o:1, dev:sdi1
Jul  6 21:10:04 mars kernel: [ 7766.940183] raid5: device sdk1 operational as raid disk 0
Jul  6 21:10:04 mars kernel: [ 7766.940186] raid5: device sdi1 operational as raid disk 4
Jul  6 21:10:04 mars kernel: [ 7766.940189] raid5: device sdj1 operational as raid disk 3
Jul  6 21:10:04 mars kernel: [ 7766.940191] raid5: device sdl1 operational as raid disk 2
Jul  6 21:10:04 mars kernel: [ 7766.940193] raid5: device sde1 operational as raid disk 1
Jul  6 21:10:04 mars kernel: [ 7766.940840] raid5: allocated 6386kB for md0
Jul  6 21:10:04 mars kernel: [ 7766.952476] 0: w=1 pa=0 pr=6 m=2 a=18 r=6 op1=0 op2=0
Jul  6 21:10:04 mars kernel: [ 7766.952480] 4: w=2 pa=0 pr=6 m=2 a=18 r=6 op1=0 op2=0
Jul  6 21:10:04 mars kernel: [ 7766.952482] 3: w=3 pa=0 pr=6 m=2 a=18 r=6 op1=0 op2=0
Jul  6 21:10:04 mars kernel: [ 7766.952485] 2: w=4 pa=0 pr=6 m=2 a=18 r=6 op1=0 op2=0
Jul  6 21:10:04 mars kernel: [ 7766.952487] 1: w=5 pa=0 pr=6 m=2 a=18 r=6 op1=0 op2=0
Jul  6 21:10:04 mars kernel: [ 7766.952490] raid5: raid level 6 set md0 active with 5 out of 6 devices, algorithm 18
Jul  6 21:10:04 mars kernel: [ 7766.952516] RAID5 conf printout:
Jul  6 21:10:04 mars kernel: [ 7766.952517]  --- rd:6 wd:5
Jul  6 21:10:04 mars kernel: [ 7766.952520]  disk 0, o:1, dev:sdk1
Jul  6 21:10:04 mars kernel: [ 7766.952522]  disk 1, o:1, dev:sde1
Jul  6 21:10:04 mars kernel: [ 7766.952523]  disk 2, o:1, dev:sdl1
Jul  6 21:10:04 mars kernel: [ 7766.952525]  disk 3, o:1, dev:sdj1
Jul  6 21:10:04 mars kernel: [ 7766.952527]  disk 4, o:1, dev:sdi1
Jul  6 21:10:04 mars kernel: [ 7766.952536] ------------[ cut here ]------------
Jul  6 21:10:04 mars kernel: [ 7766.952542] WARNING: at /build/buildd-linux-2.6_2.6.34-1~experimental.2-amd64-zn0ozk/linux-2.6-2.6.34/debian/build/source_amd64_none/fs/sysfs/dir.c:451 sysfs_add_one+0xcc/0xe3()
Jul  6 21:10:04 mars kernel: [ 7766.952545] Hardware name: System Product Name
Jul  6 21:10:04 mars kernel: [ 7766.952547] sysfs: cannot create duplicate filename '/devices/virtual/block/md0/md/stripe_cache_size'
Jul  6 21:10:04 mars kernel: [ 7766.952549] Modules linked in: ip6t_LOG xt_hl nf_conntrack_ipv6 ipt_REJECT ipt_LOG xt_limit xt_tcpudp ipt_addrtype ipt_MASQUERADE iptable_nat xt_state ip6table_filter ip6_tables nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables nfsd nfs lockd fscache nfs_acl auth_rpcgss sunrpc pppoe pppox ppp_generic xfs exportfs rr272x_1x(P) hwmon_vid loop sha256_generic aes_x86_64 aes_generic cbc dm_crypt dm_mod arc4 ecb dvb_pll ath5k hfcpci mac80211 stv0299 b2c2_flexcop_pci mISDN_core ath b2c2_flexcop hisax dvb_core snd_pcm cx24123 snd_timer crc_ccitt cfg80211 cx24113 isdn snd rfkill s5h1420 soundcore i2c_nforce2 evdev pcspkr led_class snd_page_alloc edac_core edac_mce_amd i2c_core k8temp slhc tpm_tis tpm tpm_bios processor button asus_atk0110 ext3 jbd mbcache raid456 md_mod async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx ohci_hcd sd_mod crc_t10dif ata_generic fan firewire_ohci
Jul  6 21:10:04 mars kernel: ehci_hcd pata_amd firewire_core crc_itu_t thermal thermal_sys sata_nv usbcore nls_base libata forcedeth scsi_mod [last unloaded: scsi_wait_scan]
Jul  6 21:10:04 mars kernel: [ 7766.952622] Pid: 6844, comm: mdadm Tainted: P        W  2.6.34-1-amd64 #1
Jul  6 21:10:04 mars kernel: [ 7766.952624] Call Trace:
Jul  6 21:10:04 mars kernel: [ 7766.952630]  [<ffffffff81045655>] ? warn_slowpath_common+0x76/0x8c
Jul  6 21:10:04 mars kernel: [ 7766.952635]  [<ffffffff810456bd>] ? warn_slowpath_fmt+0x40/0x45
Jul  6 21:10:04 mars kernel: [ 7766.952638]  [<ffffffff81138209>] ? sysfs_add_one+0xcc/0xe3
Jul  6 21:10:04 mars kernel: [ 7766.952642]  [<ffffffff81137416>] ? sysfs_add_file_mode+0x4b/0x7d
Jul  6 21:10:04 mars kernel: [ 7766.952646]  [<ffffffff8113933c>] ? internal_create_group+0xdd/0x16b
Jul  6 21:10:04 mars kernel: [ 7766.952652]  [<ffffffffa01a0f67>] ? run+0x4fa/0x685 [raid456]
Jul  6 21:10:04 mars kernel: [ 7766.952660]  [<ffffffffa0181b20>] ? level_store+0x3b7/0x42e [md_mod]
Jul  6 21:10:04 mars kernel: [ 7766.952667]  [<ffffffffa017eb1f>] ? md_attr_store+0x77/0x96 [md_mod]
Jul  6 21:10:04 mars kernel: [ 7766.952670]  [<ffffffff81136ede>] ? sysfs_write_file+0xe3/0x11f
Jul  6 21:10:04 mars kernel: [ 7766.952674]  [<ffffffff810e7bc6>] ? vfs_write+0xa4/0x101
Jul  6 21:10:04 mars kernel: [ 7766.952677]  [<ffffffff810e7cd6>] ? sys_write+0x45/0x6b
Jul  6 21:10:04 mars kernel: [ 7766.952680]  [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b
Jul  6 21:10:04 mars kernel: [ 7766.952682] ---[ end trace 2d8a2ef8dd7ca7b7 ]---
Jul  6 21:10:04 mars kernel: [ 7766.952687] raid5: failed to create sysfs attributes for md0
Jul  6 21:21:10 mars kernel: [ 8432.471242] md: md0 stopped.
[...]
Jul  6 21:25:23 mars kernel: [ 8685.659605] raid5: raid level 6 set md0 active with 5 out of 6 devices, algorithm 18
Jul  6 21:25:23 mars kernel: [ 8685.659631] RAID5 conf printout:
Jul  6 21:25:23 mars kernel: [ 8685.659632]  --- rd:6 wd:5
Jul  6 21:25:23 mars kernel: [ 8685.659634]  disk 0, o:1, dev:sdk1
Jul  6 21:25:23 mars kernel: [ 8685.659636]  disk 1, o:1, dev:sde1
Jul  6 21:25:23 mars kernel: [ 8685.659638]  disk 2, o:1, dev:sdl1
Jul  6 21:25:23 mars kernel: [ 8685.659640]  disk 3, o:1, dev:sdj1
Jul  6 21:25:23 mars kernel: [ 8685.659641]  disk 4, o:1, dev:sdi1
Jul  6 21:25:23 mars kernel: [ 8685.659677] md0: detected capacity change from 0 to 6001196793856
Jul  6 21:25:23 mars kernel: [ 8685.659813]  md0: unknown partition table
[...]
Jul  6 21:28:16 mars kernel: [ 8858.893402] md: recovery of RAID array md0
Jul  6 21:28:16 mars kernel: [ 8858.893405] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Jul  6 21:28:16 mars kernel: [ 8858.893407] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Jul  6 21:28:16 mars kernel: [ 8858.893422] md: using 128k window, over a total of 1465135936 blocks.
[...]
Jul  6 21:31:18 mars kernel: [ 9040.089434] rr272x_1x:Device error information 0x1000000
Jul  6 21:31:18 mars kernel: [ 9040.089440] rr272x_1x:Task file error, StatusReg=0x41, ErrReg=0x4, LBA[0-3]=0x17fbec6,LBA[4-7]=0x0.
[...] (more of these messages)
Jul  6 23:54:20 mars kernel: [17622.164846] sd 8:0:6:0: [sdk] Unhandled error code
Jul  6 23:54:20 mars kernel: [17622.164848] sd 8:0:6:0: [sdk] Result: hostbyte=DID_ABORT driverbyte=DRIVER_INVALID
Jul  6 23:54:20 mars kernel: [17622.164851] sd 8:0:6:0: [sdk] CDB: Read(10): 28 00 49 ff 72 3f 00 00 38 00
Jul  6 23:54:20 mars kernel: [17622.164858] end_request: I/O error, dev sdk, sector 1241477695
Jul  6 23:54:20 mars kernel: [17622.181377] raid5:md0: read error corrected (8 sectors at 1241477632 on sdk1)
Jul  6 23:54:20 mars kernel: [17622.181503] raid5:md0: read error corrected (8 sectors at 1241477640 on sdk1)
Jul  6 23:54:20 mars kernel: [17622.181507] raid5:md0: read error corrected (8 sectors at 1241477648 on sdk1)
Jul  6 23:54:20 mars kernel: [17622.181510] raid5:md0: read error corrected (8 sectors at 1241477656 on sdk1)
Jul  6 23:54:20 mars kernel: [17622.181512] raid5:md0: read error corrected (8 sectors at 1241477664 on sdk1)
Jul  6 23:54:20 mars kernel: [17622.181515] raid5:md0: read error corrected (8 sectors at 1241477672 on sdk1)
Jul  6 23:54:20 mars kernel: [17622.181518] raid5:md0: read error corrected (8 sectors at 1241477680 on sdk1)
[...] (more sdk failures)
Jul  6 23:58:38 mars kernel: [17880.544030] INFO: task md0_resync:13565 blocked for more than 120 seconds.
Jul  6 23:58:38 mars kernel: [17880.544109] md0_resync    D ffff88007cdf9500     0 13565      2 0x00000000
Jul  6 23:58:38 mars kernel: [17880.544114]  ffff88007cdf9500 0000000000000046 0000000000000086 ffff88007400a800
Jul  6 23:58:38 mars kernel: [17880.544117]  ffff88007ce25400 0000000000015200 0000000000015200 0000000000015200
Jul  6 23:58:38 mars kernel: [17880.544120]  ffff88006baf5fd8 0000000000015200 ffff88007cdf9500 ffff88006baf5fd8
Jul  6 23:58:38 mars kernel: [17880.544124] Call Trace:
Jul  6 23:58:38 mars kernel: [17880.544145]  [<ffffffffa019bdef>] ? unplug_slaves+0x70/0xa6 [raid456]
Jul  6 23:58:38 mars kernel: [17880.544150]  [<ffffffffa019c17e>] ? get_active_stripe+0x26f/0x54b [raid456]
Jul  6 23:58:38 mars kernel: [17880.544154]  [<ffffffff81040b38>] ? default_wake_function+0x0/0xf
Jul  6 23:58:38 mars kernel: [17880.544159]  [<ffffffffa019ef3b>] ? sync_request+0x245/0x2d1 [raid456]
Jul  6 23:58:38 mars kernel: [17880.544166]  [<ffffffffa01809d4>] ? is_mddev_idle+0xa2/0xf5 [md_mod]
Jul  6 23:58:38 mars kernel: [17880.544172]  [<ffffffffa018113a>] ? md_do_sync+0x713/0xb0f [md_mod]
Jul  6 23:58:38 mars kernel: [17880.544177]  [<ffffffff8105d495>] ? autoremove_wake_function+0x0/0x2a
Jul  6 23:58:38 mars kernel: [17880.544182]  [<ffffffffa0181c89>] ? md_thread+0xf2/0x110 [md_mod]
Jul  6 23:58:38 mars kernel: [17880.544188]  [<ffffffffa0181b97>] ? md_thread+0x0/0x110 [md_mod]
Jul  6 23:58:38 mars kernel: [17880.544190]  [<ffffffff8105d061>] ? kthread+0x75/0x7d
Jul  6 23:58:38 mars kernel: [17880.544194]  [<ffffffff810097e4>] ? kernel_thread_helper+0x4/0x10
Jul  6 23:58:38 mars kernel: [17880.544197]  [<ffffffff8105cfec>] ? kthread+0x0/0x7d
Jul  6 23:58:38 mars kernel: [17880.544199]  [<ffffffff810097e0>] ? kernel_thread_helper+0x0/0x10

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux