Hello, the kernel version is: root@ubuntu:~# uname -a Linux ubuntu 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux mdadm version is: root@ubuntu:~# mdadm -V mdadm - v3.1.4 - 31st August 2010 Examining the three array components: root@ubuntu:~# mdadm -E /dev/sd{a,b,c} /dev/sda: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f Name : vc:zvp_1123 Creation Time : Mon Jun 6 21:10:38 2011 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 41940992 (20.00 GiB 21.47 GB) Array Size : 83879936 (40.00 GiB 42.95 GB) Used Dev Size : 41939968 (20.00 GiB 21.47 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 8db90071:be80216e:09468262:1f5046b1 Internal Bitmap : 8 sectors from superblock Update Time : Mon Jun 6 21:10:46 2011 Checksum : 2e424556 - correct Events : 10 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : A.A ('A' == active, '.' == missing) /dev/sdb: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f Name : vc:zvp_1123 Creation Time : Mon Jun 6 21:10:38 2011 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 41940992 (20.00 GiB 21.47 GB) Array Size : 83879936 (40.00 GiB 42.95 GB) Used Dev Size : 41939968 (20.00 GiB 21.47 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 9f41313b:b1aa70f8:6cf0ca2f:c6ea0a64 Internal Bitmap : 8 sectors from superblock Update Time : Mon Jun 6 21:10:44 2011 Checksum : 2d23c61 - correct Events : 8 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : AAA ('A' == active, '.' == missing) /dev/sdc: Magic : a92b4efc Version : 1.2 Feature Map : 0x3 Array UUID : b5802763:fd4790dd:ee8bdeb2:2418097f Name : vc:zvp_1123 Creation Time : Mon Jun 6 21:10:38 2011 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 41940992 (20.00 GiB 21.47 GB) Array Size : 83879936 (40.00 GiB 42.95 GB) Used Dev Size : 41939968 (20.00 GiB 21.47 GB) Data Offset : 2048 sectors Super Offset : 8 sectors Recovery Offset : 999424 sectors State : active Device UUID : 61189a9d:ec082cea:a3ba32fb:800fe84b Internal Bitmap : 8 sectors from superblock Update Time : Mon Jun 6 21:10:46 2011 Checksum : a47a059 - correct Events : 10 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : A.A ('A' == active, '.' == missing) Details about the array: root@ubuntu:~# mdadm -Q --detail /dev/md1123 /dev/md1123: Version : 1.2 Creation Time : Mon Jun 6 21:10:38 2011 Raid Level : raid5 Array Size : 41939968 (40.00 GiB 42.95 GB) Used Dev Size : 20969984 (20.00 GiB 21.47 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Jun 6 21:10:46 2011 State : active, FAILED Active Devices : 1 Working Devices : 2 Failed Devices : 1 Spare Devices : 1 Layout : left-symmetric Chunk Size : 512K Name : vc:zvp_1123 UUID : b5802763:fd4790dd:ee8bdeb2:2418097f Events : 10 Number Major Minor RaidDevice State 0 8 0 0 active sync /dev/sda 1 8 16 1 faulty spare rebuilding /dev/sdb 3 8 32 2 spare rebuilding /dev/sdc Basically, the thing is that the faulty (and the rebuilding spare) component are not kicked out of the array, and the array is stuck in this state. Thanks, Alex. 2011/6/6 Nagilum <nagilum@xxxxxxxxxxx>: > Make sure you provide all relevant details such as kernel version, mdadm > version and maybe also mdadm -E /dev/sd{a,b,c}, mdadm -Q --detail /dev/md0, > .. > > ----- Message from alex.bolshoy@xxxxxxxxx --------- > Date: Sun, 5 Jun 2011 22:41:55 +0300 > From: Alexander Lyakas <alex.bolshoy@xxxxxxxxx> > Subject: RAID5: failing an active component during spare rebuild - arrays > hangs > To: linux-raid@xxxxxxxxxxxxxxx > > >> Hello everybody, >> I am testing a scenario, in which I create a RAID5 with three devices: >> /dev/sd{a,b,c}. Since I don't supply --force to mdadm during creation, >> it treats the array as degraded and starts rebuilding the sdc as a >> spare. This is as documented. >> >> Then I do --fail on /dev/sda. I understand that at this point my data >> is gone, but I think should still be able to tear down the array. >> >> Sometimes I see that /dev/sda is kicked from the array as faulty, and >> /dev/sdc is also removed and marked as a spare. Then I am able to tear >> down the array. >> >> But sometimes, it looks like the system hits some kind of a deadlock. >> mdadm --detail produces: >> >> Update Time : Sun Jun 5 21:54:34 2011 >> State : active, FAILED >> Active Devices : 1 >> Working Devices : 2 >> Failed Devices : 1 >> Spare Devices : 1 >> >> Layout : left-symmetric >> Chunk Size : 512K >> >> Name : ubuntu:zvp_1123 >> UUID : 48a15fb6:b6410bb9:a2ca173e:0092032c >> Events : 67 >> >> Number Major Minor RaidDevice State >> 0 8 0 0 faulty spare rebuilding /dev/sda >> 1 8 16 1 active sync /dev/sdb >> 3 8 32 2 spare rebuilding /dev/sdc >> >> So the faulty device and the spare are not kicked out of the array. At >> this point I am unable to do anything with the array: >> >> root@ubuntu:~# sudo mdadm --stop /dev/md1123 >> mdadm: failed to stop array /dev/md1123: Device or resource busy >> Perhaps a running process, mounted filesystem or active volume group? >> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sda >> mdadm: hot remove failed for /dev/sda: Device or resource busy >> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdb >> mdadm: hot remove failed for /dev/sdb: Device or resource busy >> root@ubuntu:~# sudo mdadm /dev/md1123 --remove /dev/sdc >> mdadm: hot remove failed for /dev/sdc: Device or resource busy >> >> This is happening on ubuntu-natty, with mdadm - v3.1.4 - 31st August 2010. >> Looking at some code in mdadm/Detail.c, it looks like /dev/sda has >> been marked only as MD_DISK_FAULTY, but has not yet been kicked out of >> the array. The "spare" and "rebuilding" prints also result from that. >> >> Same thing also happens (sometimes) when I manually initiate resync >> (by writing 'repair' to 'sync_action'), and later manually failing one >> of the devices. Then I also saw messages like this in the syslog: >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350454] INFO: task >> md1123_resync:7993 blocked for more than 120 seconds. >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350552] "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350644] md1123_resync D >> 0000000000000000 0 7993 2 0x00000004 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350647] ffff8800b56b1cd0 >> 0000000000000046 ffff8800b56b1fd8 ffff8800b56b0000 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350649] 0000000000013d00 >> ffff880036c09a98 ffff8800b56b1fd8 0000000000013d00 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350652] ffff8800b7f1adc0 >> ffff880036c096e0 ffff8800b56b1cb0 ffff880036c56610 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350654] Call Trace: >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350657] [<ffffffff81492885>] >> md_do_sync+0xb45/0xc90 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350660] [<ffffffff81087940>] ? >> autoremove_wake_function+0x0/0x40 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350663] [<ffffffff8107861b>] ? >> recalc_sigpending+0x1b/0x50 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350665] [<ffffffff8148c516>] >> md_thread+0x116/0x150 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350667] [<ffffffff8148c400>] ? >> md_thread+0x0/0x150 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350669] [<ffffffff810871f6>] >> kthread+0x96/0xa0 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350672] [<ffffffff8100cde4>] >> kernel_thread_helper+0x4/0x10 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350674] [<ffffffff81087160>] ? >> kthread+0x0/0xa0 >> Jun 5 21:42:00 ubuntu kernel: [ 2280.350676] [<ffffffff8100cde0>] ? >> kernel_thread_helper+0x0/0x10 >> >> This is pretty easy for me to reproduce. >> >> Basically, I would like to know what the user is expected to do when >> more than one RAID5 array component fails during rebuild/resync. >> >> Thanks, >> Alex. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > ----- End message from alex.bolshoy@xxxxxxxxx ----- > > > > ======================================================================== > # _ __ _ __ http://www.nagilum.org/ \n icq://69646724 # > # / |/ /__ ____ _(_) /_ ____ _ nagilum@xxxxxxxxxxx \n +491776461165 # > # / / _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # > # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # > # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # > ======================================================================== > > > ---------------------------------------------------------------- > cakebox.homeunix.net - all the machine one needs.. > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html