I've been deliberately monitoring the kernel via the git web interfaces,
and I can't yet see the patch committed that supposedly fixed this.
(Please correct me if it was actually committed.)
While a single 10s stuck CPU may not be serious, it *is* serious when it
happens over and over and over again consecutively (like it does in my
case).
Thanks,
Lee.
Majed B. wrote:
And it's not serious.
On Wed, Oct 21, 2009 at 8:01 AM, Majed B. <majedb@xxxxxxxxx> wrote:
Hello,
I believe this has been fixed in 2.6.30 or 2.6.31.
On Wed, Oct 21, 2009 at 5:46 AM, Steven Haigh <netwiz@xxxxxxxxx> wrote:
When trying to run a check using:
echo check > /sys/block/md2/md/sync_action
I got the following errors printed to the console:
Oct 21 13:31:03 wireless kernel: md: syncing RAID array md2
Oct 21 13:31:03 wireless kernel: md: minimum _guaranteed_ reconstruction
speed: 1000 KB/sec/disc.
Oct 21 13:31:03 wireless kernel: md: using maximum available idle IO
bandwidth (but not more than 20000 KB/sec) for reconstruction.
Oct 21 13:31:03 wireless kernel: md: using 128k window, over a total of
300511808 blocks.
BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:358]
Pid: 358, comm: md2_raid1
EIP: 0060:[<c04ec1bc>] CPU: 0
EIP is at memcmp+0xd/0x22
EFLAGS: 00000202 Not tainted (2.6.18-164.el5 #1)
EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0 EDX: 00000000
ESI: 00000020 EDI: 00000090 EBP: f70b8e40 DS: 007b ES: 007b
CR0: 8005003b CR2: 0806af70 CR3: 37872000 CR4: 000006d0
[<f8843c64>] raid1d+0x270/0xbea [raid1]
[<c0616870>] schedule+0x9cc/0xa55
[<c0616f33>] schedule_timeout+0x13/0x8c
[<c05a6b5e>] md_thread+0xdf/0xf5
[<c0434907>] autoremove_wake_function+0x0/0x2d
[<c05a6a7f>] md_thread+0x0/0xf5
[<c0434845>] kthread+0xc0/0xeb
[<c0434785>] kthread+0x0/0xeb
[<c0405c53>] kernel_thread_helper+0x7/0x10
=======================
Oct 21 13:37:50 wireless kernel: BUG: soft lockup - CPU#0 stuck for 10s!
[md2_raid1:358]
Oct 21 13:37:50 wireless kernel:
Oct 21 13:37:50 wireless kernel: Pid: 358, comm: md2_raid1
Oct 21 13:37:50 wireless kernel: EIP: 0060:[<c04ec1bc>] CPU: 0
Oct 21 13:37:50 wireless kernel: EIP is at memcmp+0xd/0x22
Oct 21 13:37:50 wireless kernel: EFLAGS: 00000202 Not tainted
(2.6.18-164.el5 #1)
Oct 21 13:37:50 wireless kernel: EAX: 00000000 EBX: e2826fe0 ECX: d15f3fe0
EDX: 00000000
Oct 21 13:37:50 wireless kernel: ESI: 00000020 EDI: 00000090 EBP: f70b8e40
DS: 007b ES: 007b
Oct 21 13:37:50 wireless kernel: CR0: 8005003b CR2: 0806af70 CR3: 37872000
CR4: 000006d0
Oct 21 13:37:50 wireless kernel: [<f8843c64>] raid1d+0x270/0xbea [raid1]
Oct 21 13:37:50 wireless kernel: [<c0616870>] schedule+0x9cc/0xa55
Oct 21 13:37:50 wireless kernel: [<c0616f33>] schedule_timeout+0x13/0x8c
Oct 21 13:37:50 wireless kernel: [<c05a6b5e>] md_thread+0xdf/0xf5
Oct 21 13:37:51 wireless kernel: [<c0434907>]
autoremove_wake_function+0x0/0x2d
Oct 21 13:37:51 wireless kernel: [<c05a6a7f>] md_thread+0x0/0xf5
Oct 21 13:37:51 wireless kernel: [<c0434845>] kthread+0xc0/0xeb
Oct 21 13:37:51 wireless kernel: [<c0434785>] kthread+0x0/0xeb
Oct 21 13:37:51 wireless kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10
Oct 21 13:37:51 wireless kernel: =======================
This is using CentOS 5.3 with Kernel 2.6.18-164.el5 on an i686.
Is this a serious type error? Is there anything else I can supply to
diagnose things more?
# mdadm --detail /dev/md2
/dev/md2:
Version : 00.90.03
Creation Time : Mon Feb 23 17:15:41 2009
Raid Level : raid1
Array Size : 300511808 (286.59 GiB 307.72 GB)
Used Dev Size : 300511808 (286.59 GiB 307.72 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Wed Oct 21 13:46:28 2009
State : clean, resyncing
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Rebuild Status : 5% complete
UUID : fed99e3d:d08fdcc9:b9593a45:2cc09736
Events : 0.30584
Number Major Minor RaidDevice State
0 3 3 0 active sync /dev/hda3
1 22 3 1 active sync /dev/hdc3
--
Steven Haigh
Email: netwiz@xxxxxxxxx
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html