Re: debugging kernel hang (can't type anything)

Sukanto Ghosh <sukanto.cse.iitb@xxxxxxxxx> · Wed, 18 Feb 2009 14:28:48 +0530

Hi Peter,

Accidentally I came upon this article:
http://stackframe.blogspot.com/2007/04/debugging-linux-kernels-with.html

With the help of it I could get a stacktrace when the kernel hung.

The problem was a stupid one: I was holding a spin_lock and then I
called some function that again tries to hold the same lock.  Now I
release the lock before I call that method.

I was working with mm/thrash.c

But now I am getting a "BUG: spinlock recursion on CPU#0" error.  What
does this error mean ?
Does it mean again it's spinning indefinitely on a spinlock ?

I got the following backtrace:
#0  0xc0505f1b in delay_tsc (loops=1) at arch/x86/lib/delay.c:85
#1  0xc0505f77 in __udelay (usecs=3479683981) at arch/x86/lib/delay.c:118
#2  0xc700ad98 in ?? ()
#3  0xc05097ba in _raw_spin_lock (lock=0xc6c34998) at lib/spinlock_debug.c:116
#4  0xc0647509 in _spin_lock_bh (lock=0xc6c349a8) at kernel/spinlock.c:113
#5  0xc048189e in dmam_pool_match (dev=<value optimized out>, res=0x1cb,
    match_data=0x0) at mm/dmapool.c:457
#6  0x00000001 in ?? ()

While writing this mail I had paused my guest OS kernel from gdb (^c)
for sometime and when I said continue (c) it printed: "Clocksource tsc
unstable (delta = 838972636559 ns)

Regards,
Sukanto Ghosh

On Wed, Feb 18, 2009 at 4:52 AM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
> Would u like to share WHERE u made the change?   WHAT u do could be part of
> academic exercise, so perhaps u want to keep confidential, but WHERE would
> be helpful.
>
> I am suspecting (very usual for changes to MM codes) that u have done
> something illegal while holding a open spinlock.   So knowing where u insert
> codes, will help us to understand if this is a problem or not.
>
> On Sat, Feb 14, 2009 at 7:52 PM, Sukanto Ghosh <sukanto.cse.iitb@xxxxxxxxx>
> wrote:
>>
>> Hi,
>>
>> I have made some changes to the memory management part of the kernel
>> as an experiment. Now when I boot into that kernel and start some
>> heavy processes (which cause paging), the kernel hangs. I can't even
>> type anything.
>>
>> I have gone through the 'paper on debugging kernel oops or hang'
>> (http://mail.nl.linux.org/kernelnewbies/2003-08/msg00347.html)
>>
>> In this paper Erik says that to get the stack trace we can type
>> 'Alt-SysRq-t' which prints the stack trace and when it's not possible
>> to type anything, then it's best to use serial port + console. he says
>> the config for lilo would be: console=ttyS0,9600 console=tty0
>>
>> As I have grub I am using the following lines:
>>
>> default=0
>> timeout=15
>> title Fedora (2.6.27.4)
>>        root (hd0,0)
>>        kernel /boot/vmlinuz-2.6.27.4 ro root=/dev/sda1
>>        initrd /boot/initrd-2.6.27.4.img
>>        serial --unit=0 --speed=9600 --word=8 --parity=no --stop=1
>>        terminal --dumb --timeout=10 serial console
>>
>>
>>
>> CONFIG_MAGIC_SYSRQ was enabled in my config file.
>>
>> My test kernel is running inside a Virtual machine (VM) (VMware), with
>> its serial port 0 redirected to a file.
>> VM OS: fedora Core 9 with modified kernel 2.6.27.4
>> Host OS: ubuntu hardy 2.6.24.3
>>
>> My problem is I am not getting any kind of output in the file to which
>> I redirected the serial port of the VM except a bunch of "Press any
>> key to continue .. " messages.
>>
>> should I be providing the 'alt-sysrq-t' input through the serial port,
>> if so, how ?
>> can i connect a host terminal to the serial port of the VM.
>> Vmware gives me three options about the serial port of the Virtual Machine
>> i) connect it to physical port of the host, ii) connect to a named
>> pipe and, iii)connect it to a file in the host.
>>
>> Please help ...
>>
>>
>> --
>> Regards,
>> Sukanto Ghosh
>>
>> --
>> To unsubscribe from this list: send an email with
>> "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
>> Please read the FAQ at http://kernelnewbies.org/FAQ
>>
>
>
>
> --
> Regards,
> Peter Teoh
>

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ