On Thu, Feb 19, 2009 at 8:14 AM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote: > On Thu, Feb 19, 2009 at 3:19 AM, Sukanto Ghosh <sukanto.cse.iitb@xxxxxxxxx> > wrote: >> >> Hi, >> >> I am running into a peculiar problem here. >> >> I am getting stack-traces from two sources, i) spin_lock BUG_ON "cpu >> recursion", and ii) remote gdb running on my host and connected to the >> guest kernel (the one I am debugging). >> >> spin_lock BUG_ON stack-trace says: >> BUG: spinlock recursion on CPU#0, qsbench/2169 >> lock: c6783618, .magic: dead4ead, .owner: qsbench/2169, .owner_cpu: 0 >> Pid: 2169, comm: qsbench Not tainted 2.6.27.4-sg-kgdb #2 >> [<c0644bb7>] ? printk+0xf/0x18 >> [<c05096ae>] spin_bug+0x7c/0x87 >> [<c0509766>] _raw_spin_lock+0x35/0xfa >> [<c0647509>] _spin_lock+0x3d/0x4b >> [<c048189e>] grab_swap_token+0x20c/0x246 <=== >> [<c0477873>] handle_mm_fault+0x35b/0x70a >> [<c0649352>] ? do_page_fault+0x2a6/0x722 >> [<c0649425>] do_page_fault+0x379/0x722 >> [<c0445d82>] ? __lock_acquire+0x7bd/0x814 >> [<c0445d82>] ? __lock_acquire+0x7bd/0x814 >> [<c043f9c4>] ? getnstimeofday+0x3c/0xd6 >> .... >> >> >> And gdb shows: >> #0 0xc0505f1b in delay_tsc (loops=1) at arch/x86/lib/delay.c:85 >> #1 0xc0505f77 in __udelay (usecs=2394097143) at arch/x86/lib/delay.c:118 >> #2 0xc6866d98 in ?? () >> #3 0xc05097ba in _raw_spin_lock (lock=0xc6783618) at >> lib/spinlock_debug.c:116 >> #4 0xc0647509 in _spin_lock_bh (lock=0xc6783628) at kernel/spinlock.c:113 >> #5 0xc048189e in dmam_pool_match (dev=<value optimized out>, res=0x2d5, >> match_data=0x0) at mm/dmapool.c:457 <=== >> #6 0x00000001 in ?? () >> >> >> If you notice the lines I have shown with arrows, the address >> 0xc048189e is shown in different function in each case. >> What is more surprising, System.map says >> c0481692 T grab_swap_token >> c04818d8 t dmam_pool_match >> (consistent with the spin_lock BUG_ON trace) >> >> >> But on going through the output of 'objdump -s -l --source' I found >> that 'dmam_pool_match' starts at address 0xc0481898. >> (consistent with the gdb back-trace output) > > I must say I am quite surprised by this feature of --source.....quite useful > ...but now sure how to use it not. But it really takes too > LONG......normally I do "objdump -rd" much faster. But to find the > address, both approach are the same. > > And the address indeed have some problem....u sure u did not used an old > copy of these files ?(as u said ..... these are from two different physical > sources??? compiled codes may be different on different machines) > Normally, System.map's address should match to "objdump -rd"'s address. In > your case it is c04818d8 vs c0481898, weird. Mine both are identical. Actually the addresses produced by 'objdump -rd' is matching with that in System.map But I was using 'objdump -s -l --source' which is producing different addresses. (It prints code fragments along with the disassembled code, but I don't understand why it produces different addresses). I am compiling on the test virtual machine (which is running my test kernel) and sending the vmlinux generated there to the host (for use with gdb). Another finding ... when I set a breakpoint at the 'grab_swap_token' function, the address shown by gdb is different from that of System.map >> >> I am confused about the source of the 'spinlock cpu recursion'. I have >> checked the code I have written (grab_swap_token modifications etc in >> mm/thrash.c) , it doesn't appears to be the source of the problem. I >> haven't touched the function dmam_pool_match() and it doesn't appears >> in mm/thrash.c >> >> >> >> >> >>> But now I am getting a "BUG: spinlock recursion on CPU#0" error. What >> >>> does this error mean ? >> >> >> >> yes, it actually means that the previous lock is still not released >> >> yet, and >> >> now u are calling a function to acquire the spinlock again......and >> >> somehow, >> >> before actually acquiring the spinlock, a check was made: >> >> >> >> lib/spinlock_debug.c: >> >> >> >> static inline void >> >> debug_spin_lock_before(spinlock_t *lock) >> >> { >> >> SPIN_BUG_ON(lock->magic != SPINLOCK_MAGIC, lock, "bad magic"); >> >> SPIN_BUG_ON(lock->owner == current, lock, "recursion"); >> >> SPIN_BUG_ON(lock->owner_cpu == raw_smp_processor_id(), >> >> lock, "cpu >> >> recursion"); >> >> } >> >> >> >> so now the above "recursion" was detected, as the spinlock is still >> >> residing >> >> on the same CPU while being reacquired. so instead of going into a >> >> tight >> >> spin, it print out the stack trace before hand. >> >> >> >> Does it mean that it's illegal for two processes (on the same cpu) >> spinning on a lock that is held by someone else (again on the same >> cpu) ? > > yes,....but very confusing. normally, when u are talking about spinlock on > one CPU, the process will not be switched out of the CPU until it has > finish and release the spinlock. the rule is that spinlock is meant for > multi-cpu synchronization, NOT multiple process on one CPU. in fact, if > another process is switch in, and found the spinlock locked by another > process, the CPU will (in the past) go into a deadly spin.....and loop wait > until the spinlock is unlock. but this is not possible, because the CPU is > spinning. now I think the code is wiser - it will throw an error > instead. if u find this happening....i think u have found a bug....:-) > It was my fault again something similar to what I had done earlier. And this one doesn't hangs... So no bug found ;-) BTW, I am using earlyprintk to redirect the printk output to serial port and log it but it is also printing the output to the console. What could be the reason ? (I want it to redirect the printk output to the usual logs + ttyS0 and not on console). -- Regards, Sukanto Ghosh -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ