26.02.2014, 11:52, "Allen Pais" <allen.pais@xxxxxxxxxx>: > Kirill, > >>>>>>> --- a/arch/sparc/mm/tsb.c >>>>>>> +++ b/arch/sparc/mm/tsb.c >>>>>>> @@ -6,6 +6,7 @@ >>>>>>> #include <linux/kernel.h> >>>>>>> #include <linux/preempt.h> >>>>>>> #include <linux/slab.h> >>>>>>> +#include <linux/locallock.h> >>>>>>> #include <asm/page.h> >>>>>>> #include <asm/pgtable.h> >>>>>>> #include <asm/mmu_context.h> >>>>>>> @@ -14,6 +15,7 @@ >>>>>>> #include <asm/oplib.h> >>>>> Yes, tb->active was set to zero. >>>> If tb->active is zero, flush_tsb_user() is never called, because of tlb_nr is permanently zero. >>> Sorry, my bad. tb->active was set to one when I ran the test with the above patch. > > The CPU now does not stall, the change I did was remove debug lockdep from the config. > Now the system runs(cyclicttest/hackbench) producing two of the below mentioned crashes. > > 1. This is as the messages says, sleeping in atomic context. Am not sure who's holding the lock. > [53990.477387] kernel BUG at kernel/rtmutex.c:738! > [53990.477393] \|/ ____ \|/ > [53990.477393] "@'/ .. \`@" > [53990.477393] /_| \__/ |_\ > [53990.477393] \__U_/ > [53990.477396] hackbench(11777): Kernel bad sw trap 5 [#2] > [53990.477403] CPU: 35 PID: 11777 Comm: hackbench Tainted: G D W 3.10.24-rt22+ #25 > [53990.477408] task: fffff80f931f9600 ti: fffff80f905ec000 task.ti: fffff80f905ec000 > [53990.477413] TSTATE: 0000004411e01600 TPC: 0000000000876ca4 TNPC: 0000000000876ca8 Y: 00000000 Tainted: G D W > [53990.477419] TPC: <rt_spin_lock_slowlock+0x304/0x340> > [53990.477423] g0: 000000000000000e g1: 0000000000000000 g2: 0000000000000000 g3: 0000000000de1800 > [53990.477427] g4: fffff80f931f9600 g5: fffff80fd74a0000 g6: fffff80f905ec000 g7: 726e656c2f72746d > [53990.477430] o0: 00000000009bcee8 o1: 00000000000002e2 o2: 0000000000000000 o3: 0000000000000001 > [53990.477434] o4: 0000000000000002 o5: 0000000000000000 sp: fffff80f905ee6f1 ret_pc: 0000000000876c9c > [53990.477439] RPC: <rt_spin_lock_slowlock+0x2fc/0x340> > [53990.477444] l0: fffff80f905eefb0 l1: fffff80f931f9600 l2: fffff80f931f9c50 l3: 0000000000a8d800 > [53990.477448] l4: 0000000000000000 l5: 0000000000de1400 l6: 0000000000de1440 l7: 0000000000000001 > [53990.477452] i0: fffff80f9026ae70 i1: 0000000000000293 i2: 0000000000000000 i3: 0000000000000000 > [53990.477456] i4: 0000000000000002 i5: 0000000000000001 i6: fffff80f905ee831 i7: 0000000000876ee0 > [53990.477462] I7: <rt_spin_lock+0x20/0x60> > [53990.477464] Call Trace: > [53990.477470] [0000000000876ee0] rt_spin_lock+0x20/0x60 > [53990.477476] [000000000052ee60] unmap_single_vma+0x200/0x6c0 > [53990.477482] [000000000052f348] unmap_vmas+0x28/0x60 > [53990.477488] [0000000000531868] exit_mmap+0x88/0x160 > [53990.477492] [000000000045e0e8] mmput+0x48/0x100 > [53990.477496] [0000000000466a3c] do_exit+0x1fc/0xa40 > [53990.477500] [0000000000427f00] die_if_kernel+0x1a0/0x340 > [53990.477506] [00000000004294a8] sun4v_data_access_exception+0x108/0x120 > [53990.477512] [0000000000406c08] sun4v_dacc+0x28/0x34 > [53990.477517] [0000000000407b64] tsb_flush+0x4/0x40 > [53990.477523] [00000000004515a8] flush_tlb_pending+0x68/0xe0 > [53990.477528] [0000000000451800] tlb_batch_add+0x1e0/0x200 > [53990.477534] [000000000053cad8] ptep_clear_flush+0x38/0x60 > [53990.477539] [000000000052b47c] do_wp_page+0x1dc/0x880 > [53990.477544] [000000000052beac] handle_pte_fault+0x38c/0x7c0 > [53990.477548] [000000000052cab8] handle_mm_fault+0xd8/0x160 > > and > > 2. [53998.070198] BUG: NMI Watchdog detected LOCKUP on CPU35, ip 0042f608, registers: > [53998.070206] CPU: 35 PID: 11694 Comm: hackbench Tainted: G D W 3.10.24-rt22+ #25 > [53998.070211] task: fffff80f91c20000 ti: fffff80f8f40c000 task.ti: fffff80f8f40c000 > [53998.070216] TSTATE: 0000000011e01606 TPC: 000000000042f608 TNPC: 000000000042f60c Y: 00000000 Tainted: G D W > [53998.070236] TPC: <stick_get_tick+0x8/0x20> > [53998.070241] g0: 0000000000000000 g1: 000000000042f600 g2: 00000000076c64ec g3: 0000000007a9b280 > [53998.070246] g4: fffff80f91c20000 g5: fffff80fd74a0000 g6: fffff80f8f40c000 g7: 0000000000000000 > [53998.070251] o0: 0000000000000001 o1: fffff80f8f40c400 o2: 000000000042fa28 o3: 0000000000000000 > [53998.070255] o4: 000000000000004f o5: 0000000000000002 sp: fffff80f8f40ee01 ret_pc: 00000000004209f4 > [53998.070264] RPC: <tl0_irq15+0x14/0x20> > [53998.070267] l0: 0000000000001000 l1: 0000000011001605 l2: 000000000042fa24 l3: 0000000000000400 > [53998.070270] l4: 000000000000000e l5: 0000000000000001 l6: 0000000000000000 l7: 0000000000000008 > [53998.070272] i0: 0000311023c1caaa i1: fffff80f8f40c400 i2: 000000000066f8b0 i3: 0000000000000000 > [53998.070275] i4: fffff80f8ab8e098 i5: fffff80f893f2a70 i6: fffff80f8f40eeb1 i7: 000000000042fa10 > [53998.070280] I7: <__delay+0x10/0x60> > [53998.070282] Call Trace: > [53998.070286] [000000000042fa10] __delay+0x10/0x60 > [53998.070291] [000000000066f8b8] do_raw_spin_lock+0xb8/0x120 > [53998.070300] [0000000000877b08] _raw_spin_lock_irqsave+0x68/0xa0 > [53998.070306] [0000000000452074] flush_tsb_user+0x14/0x120 > [53998.070309] [00000000004515a8] flush_tlb_pending+0x68/0xe0 > [53998.070312] [0000000000451800] tlb_batch_add+0x1e0/0x200 > [53998.070325] [000000000053cad8] ptep_clear_flush+0x38/0x60 > [53998.070328] [000000000052b47c] do_wp_page+0x1dc/0x880 > [53998.070331] [000000000052beac] handle_pte_fault+0x38c/0x7c0 > [53998.070334] [000000000052cab8] handle_mm_fault+0xd8/0x160 > [53998.070339] [0000000000879724] do_sparc64_fault+0x404/0x700 > [53998.070342] [0000000000407ae0] sparc64_realfault_common+0x10/0x20 > > But strangely, during boot-up I have more crash messages. > Here's what I see > > [ 520.570799] BUG: sleeping function called from invalid context at kernel/rtmu > tex.c:659 > [ 520.570802] in_atomic(): 0, irqs_disabled(): 1, pid: 2140, name: modprobe > [ 520.570803] INFO: lockdep is turned off. > [ 520.570805] irq event stamp: 4502 > [ 520.570806] hardirqs last enabled at (4501): [<00000000004d68c4>] rcu_note_c > ontext_switch+0xa4/0x300 > [ 520.570815] hardirqs last disabled at (4502): [<0000000000877a30>] _raw_spin_ > lock_irq+0x10/0x80 > [ 520.570822] softirqs last enabled at (0): [<000000000045eb58>] copy_process+ > 0x418/0x1080 > [ 520.570828] softirqs last disabled at (0): [< (null)>] (nu > ll) > [ 520.570834] CPU: 18 PID: 2140 Comm: modprobe Tainted: G W 3.10.24-r > t22+ #25 > [ 520.570835] Call Trace: > [ 520.570842] [0000000000495f0c] __might_sleep+0xec/0x160 > [ 520.570846] [0000000000876ed8] rt_spin_lock+0x18/0x60 > [ 520.570852] [00000000006e0f78] sunhv_console_write_paged+0x1d8/0x200 > [ 520.570855] [00000000004625e0] call_console_drivers.clone.2+0x120/0x1c0 > [ 520.570858] [0000000000462a14] console_unlock+0x394/0x400 > [ 520.570861] [0000000000463108] vprintk_emit+0x3a8/0x5a0 > [ 520.570863] [0000000000874378] printk+0x38/0x4c > [ 520.570874] [000000001024e78c] _base_make_ioc_operational+0xeac/0x1440 [mpt2 > sas] > [ 520.570882] [0000000010253100] mpt2sas_base_attach+0x1720/0x1ae0 [mpt2sas] > [ 520.570893] [000000001025b4fc] _scsih_probe+0x4fc/0x700 [mpt2sas] > [ 520.570900] [0000000000686120] local_pci_probe+0x20/0x40 > [ 520.570903] [000000000068680c] pci_device_probe+0xec/0x100 > [ 520.570907] [00000000006ee574] driver_probe_device+0x74/0x220 > [ 520.570909] [00000000006ee7a8] __driver_attach+0x88/0xa0 > [ 520.570913] [00000000006eca0c] bus_for_each_dev+0x6c/0xa0 > [ 520.570916] [00000000006ee39c] driver_attach+0x1c/0x40 This should be fixed like that's done for 8250 [patch drivers-serial-cleanup-locking-for-rt.patch] > and this one > > [ 519.160755] ================================= > [ 519.160756] [ INFO: inconsistent lock state ] > [ 519.160760] 3.10.24-rt22+ #25 Not tainted > [ 519.160761] --------------------------------- > [ 519.160763] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. > [ 519.160766] irq/36-MSIQ/640 [HC0[0]:SC0[0]:HE1:SE1] takes: > [ 519.160778] (&irq_desc_lock_class){?.....}, at: [<00000000004d314c>] handle_ > simple_irq+0xc/0xe0 > [ 519.160779] {IN-HARDIRQ-W} state was registered at: > [ 519.160785] [<00000000004ba7c0>] lock_acquire+0x60/0x100 > [ 519.160791] [<0000000000877930>] _raw_spin_lock+0x30/0x80 > [ 519.160795] [<00000000004d2e6c>] handle_fasteoi_irq+0xc/0x180 > [ 519.160800] [<00000000004cf118>] generic_handle_irq+0x38/0x60 > [ 519.160804] [<000000000087bf44>] handler_irq+0xc4/0x100 > [ 519.160808] [<0000000000426b2c>] valid_addr_bitmap_patch+0x74/0x288 > [ 519.160812] [<000000000042ced4>] arch_cpu_idle+0x54/0xe0 > [ 519.160817] [<00000000004a85bc>] cpu_startup_entry+0x19c/0x340 > [ 519.160822] [<0000000000870f18>] smp_callin+0x100/0x110 > [ 519.160825] [<0000000000870a78>] after_lock_tlb+0x1ac/0x1c4 > [ 519.160827] [< (null)>] (null) > [ 519.160829] irq event stamp: 19 > [ 519.160832] hardirqs last enabled at (19): [<0000000000877cc4>] _raw_spin_un > lock_irq+0x24/0x60 > [ 519.160835] hardirqs last disabled at (18): [<0000000000877a30>] _raw_spin_lo > ck_irq+0x10/0x80 > [ 519.160841] softirqs last enabled at (0): [<000000000045eb58>] copy_process+ > 0x418/0x1080 > [ 519.160843] softirqs last disabled at (0): [< (null)>] (nu > ll) > [ 519.160844] > [ 519.160844] other info that might help us debug this: > [ 519.160844] Possible unsafe locking scenario: > [ 519.160844] > [ 519.160845] CPU0 > [ 519.160845] ---- > [ 519.160847] lock(&irq_desc_lock_class); > [ 519.160848] <Interrupt> > [ 519.160850] lock(&irq_desc_lock_class); > [ 519.160850] > [ 519.160850] *** DEADLOCK *** > [ 519.160850] > [ 519.160852] no locks held by irq/36-MSIQ/640. > [ 519.160853] > [ 519.160853] stack backtrace: > [ 519.160855] CPU: 9 PID: 640 Comm: irq/36-MSIQ Not tainted 3.10.24-rt22+ #25 > [ 519.160856] Call Trace: > [ 519.160860] [00000000004b50b4] print_usage_bug+0x234/0x2e0 > [ 519.160862] [00000000004b5728] mark_lock+0x5c8/0x800 > [ 519.160864] [00000000004ba238] __lock_acquire+0x7b8/0xce0 > [ 519.160866] [00000000004ba7c0] lock_acquire+0x60/0x100 > [ 519.160868] [0000000000877930] _raw_spin_lock+0x30/0x80 > [ 519.160870] [00000000004d314c] handle_simple_irq+0xc/0xe0 > [ 519.160872] [00000000004cf118] generic_handle_irq+0x38/0x60 > [ 519.160877] [0000000000447870] sparc64_msiq_interrupt+0x50/0x120 > [ 519.160880] [00000000004d05fc] irq_forced_thread_fn+0x1c/0x80 > [ 519.160883] [00000000004d019c] irq_thread+0xdc/0x140 > [ 519.160888] [0000000000489560] kthread+0x80/0xa0 > [ 519.160893] [0000000000406104] ret_from_syscall+0x1c/0x2c > [ 519.160894] [0000000000000000] (null) > [ 519.160897] ------------[ cut here ]------------ > > what do you think? > > - Allen -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html