Anyone???... On Mon, Jun 6, 2011 at 9:19 PM, Alexander Lyakas <alex.bolshoy@xxxxxxxxx> wrote: > > Hello, > > the kernel version is: > > root@ubuntu:~# uname -a > Linux ubuntu 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC > 2011 x86_64 x86_64 x86_64 GNU/Linux > > mdadm version is: > root@ubuntu:~# mdadm -V > mdadm - v3.1.4 - 31st August 2010 > > Examining the three array components: > > root@ubuntu:~# mdadm -E /dev/sd{a,b,c} > /dev/sda: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f > Name : vc:zvp_1123 > Creation Time : Mon Jun 6 21:10:38 2011 > Raid Level : raid5 > Raid Devices : 3 > > Avail Dev Size : 41940992 (20.00 GiB 21.47 GB) > Array Size : 83879936 (40.00 GiB 42.95 GB) > Used Dev Size : 41939968 (20.00 GiB 21.47 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : active > Device UUID : 8db90071:be80216e:09468262:1f5046b1 > > Internal Bitmap : 8 sectors from superblock > Update Time : Mon Jun 6 21:10:46 2011 > Checksum : 2e424556 - correct > Events : 10 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 0 > Array State : A.A ('A' == active, '.' == missing) > /dev/sdb: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f > Name : vc:zvp_1123 > Creation Time : Mon Jun 6 21:10:38 2011 > Raid Level : raid5 > Raid Devices : 3 > > Avail Dev Size : 41940992 (20.00 GiB 21.47 GB) > Array Size : 83879936 (40.00 GiB 42.95 GB) > Used Dev Size : 41939968 (20.00 GiB 21.47 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 9f41313b:b1aa70f8:6cf0ca2f:c6ea0a64 > > Internal Bitmap : 8 sectors from superblock > Update Time : Mon Jun 6 21:10:44 2011 > Checksum : 2d23c61 - correct > Events : 8 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 1 > Array State : AAA ('A' == active, '.' == missing) > /dev/sdc: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x3 > Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f > Name : vc:zvp_1123 > Creation Time : Mon Jun 6 21:10:38 2011 > Raid Level : raid5 > Raid Devices : 3 > > Avail Dev Size : 41940992 (20.00 GiB 21.47 GB) > Array Size : 83879936 (40.00 GiB 42.95 GB) > Used Dev Size : 41939968 (20.00 GiB 21.47 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > Recovery Offset : 999424 sectors > State : active > Device UUID : 61189a9d:ec082cea:a3ba32fb:800fe84b > > Internal Bitmap : 8 sectors from superblock > Update Time : Mon Jun 6 21:10:46 2011 > Checksum : a47a059 - correct > Events : 10 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Active device 2 > Array State : A.A ('A' == active, '.' == missing) > > Details about the array: > > root@ubuntu:~# mdadm -Q --detail /dev/md1123 > /dev/md1123: > Version : 1.2 > Creation Time : Mon Jun 6 21:10:38 2011 > Raid Level : raid5 > Array Size : 41939968 (40.00 GiB 42.95 GB) > Used Dev Size : 20969984 (20.00 GiB 21.47 GB) > Raid Devices : 3 > Total Devices : 3 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Mon Jun 6 21:10:46 2011 > State : active, FAILED > Active Devices : 1 > Working Devices : 2 > Failed Devices : 1 > Spare Devices : 1 > > Layout : left-symmetric > Chunk Size : 512K > > Name : vc:zvp_1123 > UUID : b5802763:fd4790dd:ee8bdeb2:2418097f > Events : 10 > > Number Major Minor RaidDevice State > 0 8 0 0 active sync /dev/sda > 1 8 16 1 faulty spare rebuilding /dev/sdb > 3 8 32 2 spare rebuilding /dev/sdc > > > Basically, the thing is that the faulty (and the rebuilding spare) > component are not kicked out of the array, and the array is stuck in > this state. > > Thanks, > Alex. > > > 2011/6/6 Nagilum <nagilum@xxxxxxxxxxx>: > > Make sure you provide all relevant details such as kernel version, mdadm > > version and maybe also mdadm -E /dev/sd{a,b,c}, mdadm -Q --detail /dev/md0, > > .. > > > > ----- Message from alex.bolshoy@xxxxxxxxx --------- > > Date: Sun, 5 Jun 2011 22:41:55 +0300 > > From: Alexander Lyakas <alex.bolshoy@xxxxxxxxx> > > Subject: RAID5: failing an active component during spare rebuild - arrays > > hangs > > To: linux-raid@xxxxxxxxxxxxxxx > > > > > >> Hello everybody, > >> I am testing a scenario, in which I create a RAID5 with three devices: > >> /dev/sd{a,b,c}. Since I don't supply --force to mdadm during creation, > >> it treats the array as degraded and starts rebuilding the sdc as a > >> spare. This is as documented. > >> > >> Then I do --fail on /dev/sda. I understand that at this point my data > >> is gone, but I think should still be able to tear down the array. > >> > >> Sometimes I see that /dev/sda is kicked from the array as faulty, and > >> /dev/sdc is also removed and marked as a spare. Then I am able to tear > >> down the array. > >> > >> But sometimes, it looks like the system hits some kind of a deadlock. > >> mdadm --detail produces: > >> > >> Update Time : Sun Jun 5 21:54:34 2011 > >> State : active, FAILED > >> Active Devices : 1 > >> Working Devices : 2 > >> Failed Devices : 1 > >> Spare Devices : 1 > >> > >> Layout : left-symmetric > >> Chunk Size : 512K > >> > >> Name : ubuntu:zvp_1123 > >> UUID : 48a15fb6:b6410bb9:a2ca173e:0092032c > >> Events : 67 > >> > >> Number Major Minor RaidDevice State > >> 0 8 0 0 faulty spare rebuilding /dev/sda > >> 1 8 16 1 active sync /dev/sdb > >> 3 8 32 2 spare rebuilding /dev/sdc > >> > >> So the faulty device and the spare are not kicked out of the array. At > >> this point I am unable to do anything with the array: > >> > >> root@ubuntu:~# sudo mdadm --stop /dev/md1123 > >> mdadm: failed to stop array /dev/md1123: Device or resource busy > >> Perhaps a running process, mounted filesystem or active volume group? > >> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sda > >> mdadm: hot remove failed for /dev/sda: Device or resource busy > >> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdb > >> mdadm: hot remove failed for /dev/sdb: Device or resource busy > >> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdc > >> mdadm: hot remove failed for /dev/sdc: Device or resource busy > >> > >> This is happening on ubuntu-natty, with mdadm - v3.1.4 - 31st August 2010. > >> Looking at some code in mdadm/Detail.c, it looks like /dev/sda has > >> been marked only as MD_DISK_FAULTY, but has not yet been kicked out of > >> the array. The "spare" and "rebuilding" prints also result from that. > >> > >> Same thing also happens (sometimes) when I manually initiate resync > >> (by writing 'repair' to 'sync_action'), and later manually failing one > >> of the devices. Then I also saw messages like this in the syslog: > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350454] INFO: task > >> md1123_resync:7993 blocked for more than 120 seconds. > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350552] "echo 0 > > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350644] md1123_resync D > >> 0000000000000000 0 7993 2 0x00000004 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350647] ffff8800b56b1cd0 > >> 0000000000000046 ffff8800b56b1fd8 ffff8800b56b0000 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350649] 0000000000013d00 > >> ffff880036c09a98 ffff8800b56b1fd8 0000000000013d00 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350652] ffff8800b7f1adc0 > >> ffff880036c096e0 ffff8800b56b1cb0 ffff880036c56610 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350654] Call Trace: > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350657] [<ffffffff81492885>] > >> md_do_sync+0xb45/0xc90 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350660] [<ffffffff81087940>] ? > >> autoremove_wake_function+0x0/0x40 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350663] [<ffffffff8107861b>] ? > >> recalc_sigpending+0x1b/0x50 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350665] [<ffffffff8148c516>] > >> md_thread+0x116/0x150 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350667] [<ffffffff8148c400>] ? > >> md_thread+0x0/0x150 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350669] [<ffffffff810871f6>] > >> kthread+0x96/0xa0 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350672] [<ffffffff8100cde4>] > >> kernel_thread_helper+0x4/0x10 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350674] [<ffffffff81087160>] ? > >> kthread+0x0/0xa0 > >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350676] [<ffffffff8100cde0>] ? > >> kernel_thread_helper+0x0/0x10 > >> > >> This is pretty easy for me to reproduce. > >> > >> Basically, I would like to know what the user is expected to do when > >> more than one RAID5 array component fails during rebuild/resync. > >> > >> Thanks, > >> Alex. > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > > > > > ----- End message from alex.bolshoy@xxxxxxxxx ----- > > > > > > > > ======================================================================== > > # _ __ _ __ http://www.nagilum.org/ \n icq://69646724 # > > # / |/ /__ ____ _(_) /_ ____ _ nagilum@xxxxxxxxxxx \n +491776461165 # > > # / / _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # > > # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # > > # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # > > ======================================================================== > > > > > > ---------------------------------------------------------------- > > cakebox.homeunix.net - all the machine one needs.. > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html