Is there nobody who can give me any additional information on this? Executive Summary: Machine freezes with the kernel dump below when stripe_cache_size > 256 Please help if you can, running at 256 is killing performance. On Thu, Nov 19, 2009 at 7:53 PM, Enigma <enigma@xxxxxxxxxxxxxxxxxx> wrote: > I am in the process of migrating a 8x200 GB disk RAID 6 array to a > 8x500 disk array. I created the array with 2 missing disks and I > added them after the array is started. The array synced fine at the > default of 256 for /sys/block/md0/md/stripe_cache_size, but if I > changed it to a higher value, for example "echo 4096 > > /sys/block/md0/md/stripe_cache_size" the system freezes up. The > previous array was running fine with a cache size of 8192. The only > difference between my old array and this array is I increased the > chunk size to 512 from 256. The machine is a dual Xeon w/ > hyperthreading, 3 GB of main memory, kernel 2.6.29.1, mdadm v2.6.7.2. > I let the array sync at the default cache size (with fairly poor > performance) and tested the synced array and get the same behavior > under load. Whenever the cache size > 256 I get the following hang: > > [ 1453.847111] BUG: soft lockup - CPU#3 stuck for 61s! [md0_raid5:571] > [ 1453.863456] Modules linked in: ipv6 dm_mod iTCO_wdt intel_rng > rng_core pcspkr evdev i2c_i801 i2c_core e7xxx_edac edac_core > parport_pc parport containern > [ 1453.919458] > [ 1453.923455] Pid: 571, comm: md0_raid5 Not tainted (2.6.29.1-JJ #7) SE7501CW2 > [ 1453.943454] EIP: 0060:[<c033ec4e>] EFLAGS: 00000286 CPU: 3 > [ 1453.959453] EIP is at raid6_sse22_gen_syndrome+0x132/0x16c > [ 1453.979454] EAX: dcca66c0 EBX: ffffffff ECX: 000006c0 EDX: dd1be000 > [ 1453.995452] ESI: f6005e60 EDI: f6005e5c EBP: 00000014 ESP: f6005e30 > [ 1454.015452] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > [ 1454.031451] CR0: 80050033 CR2: b7ede195 CR3: 066e8000 CR4: 000006d0 > [ 1454.051451] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > [ 1454.071450] DR6: ffff0ff0 DR7: 00000400 > [ 1454.083450] Call Trace: > [ 1454.087450] [<c033adc1>] ? compute_parity6+0x201/0x26c > [ 1454.103449] [<c033b7b2>] ? handle_stripe+0x6bc/0xad0 > [ 1454.119449] [<c015537c>] ? rcu_process_callbacks+0x33/0x39 > [ 1454.139449] [<c012a24e>] ? __do_softirq+0x7f/0x125 > [ 1454.151448] [<c033bf6f>] ? raid5d+0x3a9/0x3b7 > [ 1454.167448] [<c03d1b87>] ? schedule_timeout+0x13/0x86 > [ 1454.179447] [<c01176f5>] ? default_spin_lock_flags+0x5/0x8 > [ 1454.199447] [<c0347c76>] ? md_thread+0xb6/0xcc > [ 1454.211446] [<c0135a11>] ? autoremove_wake_function+0x0/0x2d > [ 1454.231446] [<c0347bc0>] ? md_thread+0x0/0xcc > [ 1454.243446] [<c0135952>] ? kthread+0x38/0x5e > [ 1454.255445] [<c013591a>] ? kthread+0x0/0x5e > [ 1454.267445] [<c0103b93>] ? kernel_thread_helper+0x7/0x10 > > > In searching for a cause to the problem I have found a few other > people who had issues like this, but they all seemed to be on a older > kernel and the cause was a deadlock that should be resolved by my > version (ex. http://marc.info/?l=linux-raid&m=116946415327616&w=2). > Are there any known bugs that are present in my kernel that would > cause behavior like this? Here is some info about the array: > > #mdadm --examine /dev/sda2 > /dev/sda2: > Magic : a92b4efc > Version : 00.90.00 > UUID : 65f266b7:852d5253:a847f9a3:2c253025 > Creation Time : Thu Nov 19 01:57:33 2009 > Raid Level : raid6 > Used Dev Size : 401118720 (382.54 GiB 410.75 GB) > Array Size : 2406712320 (2295.22 GiB 2464.47 GB) > Raid Devices : 8 > Total Devices : 8 > Preferred Minor : 0 > > Update Time : Thu Nov 19 19:40:26 2009 > State : clean > Active Devices : 8 > Working Devices : 8 > Failed Devices : 0 > Spare Devices : 0 > Checksum : 16b3ddef - correct > Events : 1150 > > Chunk Size : 512K > > Number Major Minor RaidDevice State > this 0 8 2 0 active sync /dev/sda2 > > 0 0 8 2 0 active sync /dev/sda2 > 1 1 8 18 1 active sync /dev/sdb2 > 2 2 8 34 2 active sync /dev/sdc2 > 3 3 8 50 3 active sync /dev/sdd2 > 4 4 8 66 4 active sync /dev/sde2 > 5 5 8 98 5 active sync /dev/sdg2 > 6 6 8 82 6 active sync /dev/sdf2 > 7 7 8 114 7 active sync /dev/sdh2 > > > > # cat /proc/mdstat > Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] > [raid4] [multipath] > md1 : active raid1 hdc1[1] hda1[0] > 4200896 blocks [2/2] [UU] > > md0 : active raid6 sdh2[7] sdg2[5] sdf2[6] sde2[4] sdd2[3] sdc2[2] > sdb2[1] sda2[0] > 2406712320 blocks level 6, 512k chunk, algorithm 2 [8/8] [UUUUUUUU] > > unused devices: <none> > > > > Can anyone point me at some information to debug this problem? > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html