> Ok, the key in the trace is: > > Nov 2 16:25:30 titan kernel: [ 978.134874] CPU[ 1]: TSTATE[0000000080009603] TPC[000000000067d2e0] TNPC[000000000067d2d4] TASK[aptitude:3204] > Nov 2 16:25:30 titan kernel: [ 978.257809] TPC[_write_unlock_irq+0x20/0x110] > ... > Nov 2 16:25:30 titan kernel: [ 978.507778] CPU[ 3]: TSTATE[0000000011009605] TPC[00000000004419f8] TNPC[00000000004419fc] TASK[aptitude:3203] > Nov 2 16:25:30 titan kernel: [ 978.630707] TPC[cheetah_xcall_deliver+0x174/0x23c] > > The first symbol is misleading, it says _write_unlock_irq but actually > in the assembler the PC is in the spinlock read spinning loop > section. So actually it's hanging in _spin_lock(). > > CPU #3 is trying to send a cross-call message interrupt, but for > some reason that isn't making forward progress. > > Let's see what's calling these things by adding some more debugging > information. Please retry the test with the following patch on > top of the original sysrq-g debugging patch and please get new > logs when it hangs. Today I was a bit out of luck, either the machine crashed so badly that it just didn't react on anything anymore, or it didn't crash. The machine went amok a bit slower when I did the following things, which also resulted in the attached sysrq output. - run stress -c 2 to get the load up, didn't need that the last time... - run something like `while true; do echo g > /proc/sysrg-trigger; sleep 0.5; done` - run aptitude -u several times until the machine died. So I'm not sure if the result is really useful for you - if not just let me know. I've attached the last ~10-20 sysrq-g outputs - as it was running in a loop I have a ton of them. In case you're wondering: http is aptitude's http method. We'll also run the patched Kernel on a US II machine form tomorrow on - but it always took a longer time until it crashed, so we'll see if it happens at all. Thanks for your work, Bernd -- Bernd Zeimetz <bernd@xxxxxxx> <http://bzed.de/>
Attachment:
sysrq2.txt
Description: application/pgp-keys