Re: Adding a disk to RAID0

Victor Balakine <victor.balakine@xxxxxx> · Tue, 06 Mar 2012 11:10:52 -0800

# cat /proc/1506/stack
[<ffffffff8003a1e5>] __cond_resched+0x25/0x40
[<ffffffffa0102ebf>] raid5d+0x26f/0x3d0 [raid456]
[<ffffffff803c7a36>] md_thread+0x106/0x140
[<ffffffff8006444e>] kthread+0x7e/0x90
[<ffffffff80510d24>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff

And this is what I see on system console
[  411.331287] md: bind<xvda2>
[  411.353737] md: raid0 personality registered for level 0
[  411.354362] bio: create slab <bio-1> at 1
[  411.354377] md/raid0:md0: looking at xvda2
[  411.354382] md/raid0:md0:   comparing xvda2(8386560) with xvda2(8386560)
[  411.354389] md/raid0:md0:   END
[  411.354393] md/raid0:md0:   ==> UNIQUE
[  411.354397] md/raid0:md0: 1 zones
[  411.354400] md/raid0:md0: FINAL 1 zones
[  411.354409] md/raid0:md0: done.
[  411.354414] md/raid0:md0: md_size is 8386560 sectors.
[  411.354418] ******* md0 configuration *********
[  411.354424] zone0=[xvda2/]
[  411.354430]         zone offset=0kb device offset=0kb size=4193280kb
[  411.354434] **********************************
[  411.354436]
[  411.354451] md0: detected capacity change from 0 to 4293918720
[  411.372921]  md0: p1
[  434.228901] md/raid:md0: device xvda2 operational as raid disk 0
[  434.229104] md/raid:md0: allocated 2176kB
[  434.229159] md/raid:md0: raid level 4 active with 1 out of 2 devices, 
algorithm 5
[  434.306479] md: bind<xvda3>
[  434.405827] md: reshape of RAID array md0
[  434.405839] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  434.405844] md: using maximum available idle IO bandwidth (but not 
more than 200000 KB/sec) for reshape.
[  434.405851] md: using 128k window, over a total of 4193280k.

And a little while later:
[  960.220050] INFO: task md0_reshape:1508 blocked for more than 480 
seconds.
[  960.220068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  960.220077] md0_reshape     D 0000000000000000     0  1508      2 
0x00000000
[  960.220087]  ffff88001e69fbc0 0000000000000246 ffff88001e10c1c0 
ffffffffa0100c45
[  960.220097]  ffff88001e69ffd8 ffff88001e10c1c0 ffff88001e69ffd8 
ffff88001e10c1c0
[  960.220106]  ffff88001e038440 ffff88001e10c1c0 0000000000000001 
0000000000000000
[  960.220119] Call Trace:
[  960.220141]  [<ffffffffa0101f6d>] reshape_request+0x57d/0x930 [raid456]
[  960.220165]  [<ffffffffa010266e>] sync_request+0x23e/0x2c0 [raid456]
[  960.220183]  [<ffffffff803cae48>] md_do_sync+0x748/0xd10
[  960.220194]  [<ffffffff803c7a36>] md_thread+0x106/0x140
[  960.220204]  [<ffffffff8006444e>] kthread+0x7e/0x90
[  960.220216]  [<ffffffff80510d24>] kernel_thread_helper+0x4/0x10

Victor

On 2012-03-05 17:21, NeilBrown wrote:
On Mon, 05 Mar 2012 15:35:15 -0800 Victor Balakine<victor.balakine@xxxxxx>
wrote:

Am I the only one having problem adding disks to RAID0? Has anybody
tried that on 3.* kernel?

Strange.  It works for me.

We need to find out what the md0_raid0 process is doing.
Can you
    cat /proc/PROCESSID/stack

and see what that shows?

NeilBrown

Victor

On 2012-02-28 15:34, Victor Balakine wrote:
I am trying to add another disk to RAID0 and this functionality appears
to be broken.
First I create a RAID0 array:
#mdadm --create /dev/md0 --level=0 --raid-devices=1 --force /dev/xvda2
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

So far everything works fine. Then I add another disk to it:
#mdadm --grow /dev/md0 --raid-devices=2 --add /dev/xvda3
--backup-file=/backup-md0
mdadm: level of /dev/md0 changed to raid4
mdadm: added /dev/xvda3
mdadm: Need to backup 1024K of critical section..

This is what I see in /var/log/messages
Feb 28 15:03:30 storage kernel: [ 1420.174022] md: bind<xvda2>
Feb 28 15:03:30 storage kernel: [ 1420.209167] md: raid0 personality
registered for level 0
Feb 28 15:03:30 storage kernel: [ 1420.209818] bio: create slab<bio-1>
at 1
Feb 28 15:03:30 storage kernel: [ 1420.209832] md/raid0:md0: looking at
xvda2
Feb 28 15:03:30 storage kernel: [ 1420.209837] md/raid0:md0: comparing
xvda2(8386560) with xvda2(8386560)
Feb 28 15:03:30 storage kernel: [ 1420.209844] md/raid0:md0: END
Feb 28 15:03:30 storage kernel: [ 1420.209851] md/raid0:md0: ==>  UNIQUE
Feb 28 15:03:30 storage kernel: [ 1420.209856] md/raid0:md0: 1 zones
Feb 28 15:03:30 storage kernel: [ 1420.209859] md/raid0:md0: FINAL 1 zones
Feb 28 15:03:30 storage kernel: [ 1420.209866] md/raid0:md0: done.
Feb 28 15:03:30 storage kernel: [ 1420.209870] md/raid0:md0: md_size is
8386560 sectors.
Feb 28 15:03:30 storage kernel: [ 1420.209875] ******* md0 configuration
*********
Feb 28 15:03:30 storage kernel: [ 1420.209879] zone0=[xvda2/]
Feb 28 15:03:30 storage kernel: [ 1420.209885] zone offset=0kb device
offset=0kb size=4193280kb
Feb 28 15:03:30 storage kernel: [ 1420.209902]
**********************************
Feb 28 15:03:30 storage kernel: [ 1420.209903]
Feb 28 15:03:30 storage kernel: [ 1420.209919] md0: detected capacity
change from 0 to 4293918720
Feb 28 15:03:30 storage kernel: [ 1420.223968] md0: p1
...
Feb 28 15:04:01 storage kernel: [ 1450.783016] async_tx: api initialized
(async)
Feb 28 15:04:01 storage kernel: [ 1450.796912] xor: automatically using
best checksumming function: generic_sse
Feb 28 15:04:01 storage kernel: [ 1450.816012] generic_sse: 9509.000 MB/sec
Feb 28 15:04:01 storage kernel: [ 1450.816021] xor: using function:
generic_sse (9509.000 MB/sec)
Feb 28 15:04:01 storage kernel: [ 1450.912021] raid6: int64x1 1888 MB/s
Feb 28 15:04:01 storage kernel: [ 1450.980013] raid6: int64x2 2707 MB/s
Feb 28 15:04:01 storage kernel: [ 1451.048025] raid6: int64x4 2073 MB/s
Feb 28 15:04:01 storage kernel: [ 1451.116039] raid6: int64x8 2010 MB/s
Feb 28 15:04:01 storage kernel: [ 1451.184017] raid6: sse2x1 4764 MB/s
Feb 28 15:04:01 storage kernel: [ 1451.252018] raid6: sse2x2 5170 MB/s
Feb 28 15:04:01 storage kernel: [ 1451.320016] raid6: sse2x4 7548 MB/s
Feb 28 15:04:01 storage kernel: [ 1451.320025] raid6: using algorithm
sse2x4 (7548 MB/s)
Feb 28 15:04:01 storage kernel: [ 1451.330136] md: raid6 personality
registered for level 6
Feb 28 15:04:01 storage kernel: [ 1451.330145] md: raid5 personality
registered for level 5
Feb 28 15:04:01 storage kernel: [ 1451.330149] md: raid4 personality
registered for level 4
Feb 28 15:04:01 storage kernel: [ 1451.330662] md/raid:md0: device xvda2
operational as raid disk 0
Feb 28 15:04:01 storage kernel: [ 1451.330820] md/raid:md0: allocated
2176kB
Feb 28 15:04:01 storage kernel: [ 1451.330869] md/raid:md0: raid level 4
active with 1 out of 2 devices, algorithm 5
Feb 28 15:04:01 storage kernel: [ 1451.330874] RAID conf printout:
Feb 28 15:04:01 storage kernel: [ 1451.330876] --- level:4 rd:2 wd:1
Feb 28 15:04:01 storage kernel: [ 1451.330878] disk 0, o:1, dev:xvda2
Feb 28 15:04:01 storage kernel: [ 1451.417995] md: bind<xvda3>
Feb 28 15:04:01 storage kernel: [ 1451.616399] RAID conf printout:
Feb 28 15:04:01 storage kernel: [ 1451.616404] --- level:4 rd:3 wd:2
Feb 28 15:04:01 storage kernel: [ 1451.616408] disk 0, o:1, dev:xvda2
Feb 28 15:04:01 storage kernel: [ 1451.616411] disk 1, o:1, dev:xvda3
Feb 28 15:04:01 storage kernel: [ 1451.619054] md: reshape of RAID array
md0
Feb 28 15:04:01 storage kernel: [ 1451.619066] md: minimum _guaranteed_
speed: 1000 KB/sec/disk.
Feb 28 15:04:01 storage kernel: [ 1451.619069] md: using maximum
available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
Feb 28 15:04:01 storage kernel: [ 1451.619075] md: using 128k window,
over a total of 4193280k.
Feb 28 15:05:02 storage udevd[280]: timeout '/sbin/blkid -o udev -p
/dev/md0'
Feb 28 15:05:03 storage udevd[280]: timeout: killing '/sbin/blkid -o
udev -p /dev/md0' [1829]
Feb 28 15:05:04 storage udevd[280]: timeout: killing '/sbin/blkid -o
udev -p /dev/md0' [1829]
Feb 28 15:05:05 storage udevd[280]: timeout: killing '/sbin/blkid -o
udev -p /dev/md0' [1829]

And then it just goes on forever. md0_raid0 process stays at 100% CPU load.
# ps -ef | grep md0
root 7268 2 99 09:34 ? 05:53:00 [md0_raid0]
root 7270 2 0 09:34 ? 00:00:00 [md0_reshape]
root 7271 1 0 09:34 pts/0 00:00:00 mdadm --grow /dev/md0
--raid-devices=2 --add /dev/sdc1 --backup-file=/backup-md0

# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4]
md0 : active raid4 xvda3[2] xvda2[0]
4193280 blocks super 1.2 level 4, 512k chunk, algorithm 5 [3/1] [U__]
resync=DELAYED

unused devices:<none>

# mdadm --version
mdadm - v3.2.2 - 17th June 2011
# uname -a
Linux storage 3.1.9-1.4-xen #1 SMP Fri Jan 27 08:55:10 UTC 2012
(efb5ff4) x86_64 x86_64 x86_64 GNU/Linux

It's OpenSUSE 12.1 with all the latest updates running in XEN that I
created to reproduce the problem. The actual server is running the same
version of OpenSUSE (Linux san1 3.1.9-1.4-desktop #1 SMP PREEMPT Fri Jan
27 08:55:10 UTC 2012 (efb5ff4) x86_64 x86_64 x86_64 GNU/Linux) on a
hardware server. If you need any more information I can easily get it
since it's a VM and the problem is easily reproducible.

Victor
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Victor Balakine
Network Systems Administrator | Continuing Studies | Information Technology
The University of British Columbia | Vancouver Campus
Phone 604 822 1496
victor.balakine@xxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html