On 17-Apr-13, at 5:04 PM, Helge Deller wrote:
Have you had a chance to try my patch on a UP machine? With the
additional locking,
there's an increased chance that lockups might occur. That's the
risk.
Yes, I'm running your patch on a UP (PA8600 CPU) and a SMP (PA8500 I
think) machine.
No lockups until now, only the do_softirq() crashes I mentioned above.
I don't think I should upload my Debian kernel build. It suffers
seriously from the do_softirq() crashes.
It gets to the login console and dies either immediately or after I
hit a carriage return.
[ ok ] Starting Postfix Mail Transport Agent: postfix.
Debian GNU/Linux 7.0 mx3210 ttyS1
mx3210 login: [ 235.148000] Backtrace:
[ 235.148000] [<0000000040116878>] do_softirq+0x50/0x68
[ 235.148000] [<0000000040146ad8>] irq_exit+0x60/0x80
[ 235.148000] [<000000004011baf4>] do_cpu_irq_mask+0x214/0x2a0
[ 235.148000] [<0000000040105074>] intr_return+0x0/0x4
[ 235.148000] [<00000000401040c0>] _switch_to_ret+0x0/0xf40
[ 235.148000]
[ 235.148000]
[ 235.148000] Kernel Fault: Code=26 regs=000000007ecf07f0
(Addr=0000000000000010)
[ 235.148000]
[ 235.148000] YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[ 235.148000] PSW: 00001000000001000000000000001111 Not tainted
[ 235.148000] r00-03 000000000804000f 000000004065c080
0000000040146728 0000000000000001
[ 235.148000] r04-07 000000004080fd00 0000000000000048
000000000000000a 000000007ecf07c0
[ 235.148000] r08-11 0000000040824500 0000000000200040
0000000000000003 0000000040838d00
[ 235.148000] r12-15 0000000040755740 0000000040838500
0000000040837500 0000000040838d00
[ 235.148000] r16-19 0000000040824500 0000000000000100
0000000000000009 0000000042606b24
[ 235.148000] r20-23 ffe0000000000000 0000000042606020
8000000000000000 000000000000c7e0
[ 235.148000] r24-27 0000000000000001 0000000040660200
000000004065c0c8 000000004080fd00
[ 235.148000] r28-31 0000000000000000 000000007ecf07c0
000000007ecf07f0 0000000001d7f000
[ 235.148000] sr00-03 0000000000b16000 0000000000000000
0000000000000000 0000000000b16000
[ 235.148000] sr04-07 0000000000000000 0000000000000000
0000000000000000 0000000000000000
[ 235.148000]
[ 235.148000] IASQ: 0000000000000000 0000000000000000 IAOQ:
00000000401466bc 00000000401466c0
[ 235.148000] IIR: 53820020 ISR: 0000000000000000 IOR:
0000000000000010
[ 235.148000] CPU: 3 CR30: 000000007ecf0000 CR31:
ffffffffffffffff
[ 235.148000] ORIG_R28: 0000000000000000
[ 235.148000] IAOQ[0]: __do_softirq+0x144/0x280
[ 235.148000] IAOQ[1]: __do_softirq+0x148/0x280
[ 235.148000] RP(r2): __do_softirq+0x1b0/0x280
[ 235.148000] Backtrace:
[ 235.148000] [<0000000040116878>] do_softirq+0x50/0x68
[ 235.148000] [<0000000040146ad8>] irq_exit+0x60/0x80
[ 235.148000] [<000000004011baf4>] do_cpu_irq_mask+0x214/0x2a0
[ 235.148000] [<0000000040105074>] intr_return+0x0/0x4
[ 235.148000] [<00000000401040c0>] _switch_to_ret+0x0/0xf40
[ 235.148000]
[ 235.148000] Kernel panic - not syncing: Kernel Fault
This reminds me of the two hacks that I once had:
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 3aca9f2..b891626 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -582,6 +582,7 @@ out_eoi:
void
handle_percpu_irq(unsigned int irq, struct irq_desc *desc)
{
+ struct irqaction *action;
struct irq_chip *chip = irq_desc_get_chip(desc);
kstat_incr_irqs_this_cpu(irq, desc);
@@ -589,7 +590,9 @@ handle_percpu_irq(unsigned int irq, struct
irq_desc *desc)
if (chip->irq_ack)
chip->irq_ack(&desc->irq_data);
- handle_irq_event_percpu(desc, desc->action);
+ action = desc->action;
+ if (action)
+ handle_irq_event_percpu(desc, action);
if (chip->irq_eoi)
chip->irq_eoi(&desc->irq_data);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index ed567ba..0344acb 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -259,7 +259,7 @@ restart:
}
h++;
pending >>= 1;
- } while (pending);
+ } while (pending && h >= (struct softirq_action *)0x1000);
local_irq_disable();
In the last, I had decided that we had run off the pending queue. You
were going to
ask around about this bug.
Then, I tried to boot twice 2.6.39-rc7+. Both failed with lockups:
[ ok ] Starting Postfix Mail Transport Agent: postfix.
Debian GNU/Linux 7.0 mx3210 ttyS1
mx3210 login: BUG: soft lockup - CPU#3 stuck for 4278967496s! [swapper/
3:0]
Modules linked in: iscsi_tcp libiscsi_tcp libiscsiBUG: soft lockup -
CPU#2 stuck for 4278967496s! [swapper/2:0]
Modules linked in: iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi nfsd exportfs ipv6 ext2 ext3 mbcache jbd dm_mod
zalon7xx lasi700 53c700 hilkbd sd_mod crc_t10dif sg sr_mod cdrom tg3
sym53c8xx pata_cmd64x scsi_transport_spi ptp pps_core libata scsi_mod
YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001001111111100001111 Not tainted
r00-03 000000ff0804ff0f 000000004074fff0 00000000401255a0
000000007f0ec190
r04-07 000000004073c7f0 000000007f0ec1f0 0000000000000002
0000000000000002
r08-11 000000f0f0d08440 0200000000000000 000000000804000e
00000000407678fc
r12-15 0000000000000041 0000000040826500 0000000040837d00
0000000040660300
r16-19 fffffff0f0d00b0c 0000000000000004 0000000040826500
000000000800000e
r20-23 0000000001d75000 000000007f257e00 000000007f7c1cc0
000000000800000e
r24-27 000000000800000e 0000000000000000 000000004250d748
000000004073c7f0
r28-31 0000000000000008 000000007f0ec1f0 000000007f0ec220
0000000040684444
sr00-03 0000000000963000 0000000000963000 0000000000000000
0000000000963000
sr04-07 0000000000000000 0000000000000000 0000000000000000
0000000000000000
IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401255b4
00000000401255b8
IIR: 03c008bc ISR: 000000004075eff0 IOR: ffffffffc0000000
CPU: 2 CR30: 000000007f0ec000 CR31: ffffffffffffffff
ORIG_R28: 000000004060ac30
IAOQ[0]: cpu_idle+0x8c/0xc0
IAOQ[1]: cpu_idle+0x90/0xc0
RP(r2): cpu_idle+0x78/0xc0
Backtrace:
[<0000000040767ab0>] smp_callin+0x1b8/0x1d8
BUG: soft lockup - CPU#1 stuck for 4278967496s! [swapper/1:0]
Modules linked in: iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi nfsd exportfs ipv6 ext2 ext3 mbcache jbd dm_mod
zalon7xx lasi700 53c700 hilkbd sd_mod crc_t10dif sg sr_mod cdrom tg3
sym53c8xx pata_cmd64x scsi_transport_spi ptp pps_core libata scsi_mod
YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000011001111111100001111 Not tainted
r00-03 000000ff080cff0f 000000004074fff0 00000000401255a0
000000007f0e4190
r04-07 000000004073c7f0 000000007f0e41f0 0000000000000001
0000000000000001
r08-11 000000f0f0d08440 0100000000000000 000000000804000e
00000000407678fc
r12-15 00000000409ba638 00000000409ba638 00000000405ec040
0000000000000001
r16-19 fffffff0f0d00b0c 000000007eab57a8 0000000040668580
000000000800000e
r20-23 0000000001d6b000 000000007f257ec0 000000007f7c1cc0
000000000800000e
r24-27 000000000800000e 0000000000000000 0000000042503748
000000004073c7f0
r28-31 0000000000000008 000000007f0e41f0 000000007f0e4220
0000000040684444
sr00-03 0000000000963000 0000000000963000 0000000000000000
0000000000963000
sr04-07 0000000000000000 0000000000000000 0000000000000000
0000000000000000
IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401255c0
00000000401255b4
IIR: 0805025d ISR: 000000004075eff0 IOR: ffffffffc0000000
CPU: 1 CR30: 000000007f0e4000 CR31: ffffffffffffffff
ORIG_R28: 000000004060ac30
IAOQ[0]: cpu_idle+0x98/0xc0
IAOQ[1]: cpu_idle+0x8c/0xc0
RP(r2): cpu_idle+0x78/0xc0
Backtrace:
[<0000000040767ab0>] smp_callin+0x1b8/0x1d8
BUG: soft lockup - CPU#0 stuck for 4278967497s! [swapper/0:0]
Modules linked in: iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi nfsd exportfs ipv6 ext2 ext3 mbcache jbd dm_mod
zalon7xx lasi700 53c700 hilkbd sd_mod crc_t10dif sg sr_mod cdrom tg3
sym53c8xx pata_cmd64x scsi_transport_spi ptp pps_core libata scsi_mod
YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001001111111100001111 Not tainted
r00-03 000000ff0804ff0f 000000004074fff0 00000000401255a0
00000000405e82e0
r04-07 000000004073c7f0 00000000405e8340 0000000040691070
000000004078fb98
r08-11 0000000040691008 00000000424f6100 000000000804000e
000000004011b244
r12-15 0000000000000fe7 000000004067a768 0000000000000fe6
0000000000000001
r16-19 00000000f0d00b0c 0000000000000fe7 0000000000000fe6
000000000800000e
r20-23 0000000001d61000 000000000800000f 000000007f7c1cc0
000000000800000e
r24-27 000000000800000e 0000000000000000 00000000424f9748
000000004073c7f0
r28-31 00000000405e8000 00000000405e8340 00000000405e8370
0000000040684444
sr00-03 0000000000963000 0000000000963000 0000000000000000
0000000000963000
sr04-07 0000000000000000 0000000000000000 0000000000000000
0000000000000000
IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401255b8
00000000401255bc
IIR: 539c0020 ISR: 000000004075eff0 IOR: ffffffffc0000000
CPU: 0 CR30: 00000000405e8000 CR31: 2001001408940008
ORIG_R28: 000000004060ac30
IAOQ[0]: cpu_idle+0x90/0xc0
IAOQ[1]: cpu_idle+0x94/0xc0
RP(r2): cpu_idle+0x78/0xc0
Backtrace:
[<000000004010bc48>] rest_init+0xe0/0xf8
[<0000000040760f14>] start_kernel+0x7a4/0x7d0
[<00000000404ec278>] rpc_pipe_ioctl+0xf0/0x118
[<00000000404adb4c>] ip_mroute_getsockopt+0x84/0x118
[<000000004048ae10>] udp_ioctl+0x80/0xc8
[<0000000040486ba0>] raw_sendmsg+0x290/0x8b0
[<0000000040465998>] do_tcp_getsockopt.isra.21+0x270/0x6c0
[<0000000040441864>] compat_sys_getsockopt+0x1ec/0x228
[<00000000404415b0>] compat_sys_setsockopt+0x1d8/0x2a0
[<0000000040440f00>] cmsghdr_from_user_compat_to_kern+0x2a8/0x2f8
[<0000000040440a9c>] get_compat_msghdr+0x11c/0x170
scsi_transport_iscsi nfsd exportfs ipv6 ext2 ext3 mbcache jbd dm_mod
zalon7xx lasi700 53c700 hilkbd sd_mod crc_t10dif sg sr_mod cdrom tg3
sym53c8xx pata_cmd64x scsi_transport_spi ptp pps_core libata scsi_mod
YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001001111111100001111 Not tainted
r00-03 000000ff0804ff0f 000000004074fff0 00000000401255a0
000000007f0f0190
r04-07 000000004073c7f0 000000007f0f01f0 0000000000000003
0000000000000003
r08-11 000000f0f0d08440 0300000000000000 000000000804000e
00000000407678fc
r12-15 000000004060ac30 000000004071b3b0 0000000000000000
0000000000000001
r16-19 fffffff0f0d00b0c 000000004074eff0 000000004250f750
000000000800000e
r20-23 0000000001d7f000 000000000800000f 000000007e2dc0c0
000000000800000e
r24-27 000000000800000e 0000000000000000 0000000042517748
000000004073c7f0
r28-31 000000007f0f0000 000000007f0f01f0 000000007f0f0220
0000000040684444
sr00-03 0000000000aa6000 0000000000000000 0000000000000000
0000000000aa6000
sr04-07 0000000000000000 0000000000000000 0000000000000000
0000000000000000
IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401255b8
00000000401255bc
IIR: 539c0020 ISR: 000000004075eff0 IOR: ffffffffc0000000
CPU: 3 CR30: 000000007f0f0000 CR31: ffffffffffffffff
ORIG_R28: 000000004060ac30
IAOQ[0]: cpu_idle+0x90/0xc0
IAOQ[1]: cpu_idle+0x94/0xc0
RP(r2): cpu_idle+0x78/0xc0
Backtrace:
[<0000000040767ab0>] smp_callin+0x1b8/0x1d8
Since the number of seconds is wrong in the lockup message (e.g., "
CPU#0 stuck for 4278967497s!"),
it occurred to me that something isn't being initialized properly.
So, I powered the machine down and
rebooted again. This time it booted 3.9-rc7+ successfully.
Dave
--
John David Anglin dave.anglin@xxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html