On 11/21/2018 09:04 PM, Sergey Senozhatsky wrote: > On (11/21/18 11:49), Waiman Long wrote: > [..] >>> case ODEBUG_STATE_ACTIVE: >>> - debug_print_object(obj, "init"); >>> state = obj->state; >>> raw_spin_unlock_irqrestore(&db->lock, flags); >>> + debug_print_object(obj, "init"); >>> debug_object_fixup(descr->fixup_init, addr, state); >>> return; >>> >>> case ODEBUG_STATE_DESTROYED: >>> - debug_print_object(obj, "init"); >>> + debug_printobj = true; >>> break; >>> default: >>> break; >>> } >>> >>> raw_spin_unlock_irqrestore(&db->lock, flags); >>> + if (debug_chkstack) >>> + debug_object_is_on_stack(addr, onstack); >>> + if (debug_printobj) >>> + debug_print_object(obj, "init"); >>> > [..] >> As a side note, one of the test systems that I used generated a >> debugobjects splat in the bootup process and the system hanged >> afterward. Applying this patch alone fix the hanging problem and the >> system booted up successfully. So it is not really a good idea to call >> printk() while holding a raw spinlock. > Right, I like this patch. > And I think that we, maybe, can go even further. > > Some serial consoles call mod_timer(). So what we could have with the > debug objects enabled was > > mod_timer() > lock_timer_base() > debug_activate() > printk() > call_console_drivers() > foo_console() > mod_timer() > lock_timer_base() << deadlock > > That's one possible scenario. The other one can involve console's > IRQ handler, uart port spinlock, mod_timer, debug objects, printk, > and an eventual deadlock on the uart port spinlock. This one can > be mitigated with printk_safe. But mod_timer() deadlock will require > a different fix. > > So maybe we need to switch debug objects print-outs to _always_ > printk_deferred(). Debug objects can be used in code which cannot > do direct printk() - timekeeping is just one example. > > -ss Actually, I don't think that was the cause of the hang. The debugobjects splat was caused by debug_object_is_on_stack(), below was the output: [ 6.890048] ODEBUG: object (____ptrval____) is NOT on stack (____ptrval____), but annotated. [ 6.891000] WARNING: CPU: 28 PID: 1 at lib/debugobjects.c:369 __debug_object_init.cold.11+0x51/0x2d6 [ 6.891000] Modules linked in: [ 6.891000] CPU: 28 PID: 1 Comm: swapper/0 Not tainted 4.18.0-41.el8.bz1651764_cgroup_debug.x86_64+debug #1 [ 6.891000] Hardware name: HPE ProLiant DL120 Gen10/ProLiant DL120 Gen10, BIOS U36 11/14/2017 [ 6.891000] RIP: 0010:__debug_object_init.cold.11+0x51/0x2d6 [ 6.891000] Code: ea 03 80 3c 02 00 0f 85 85 02 00 00 49 8b 54 24 18 48 89 de 4c 89 44 24 10 48 c7 c7 00 ce 22 94 e8 73 18 62 ff 4c 8b 44 24 10 <0f> 0b e9 60 db ff ff 41 83 c4 01 b8 ff ff 37 00 44 89 25 ce 46 f9 [ 6.891000] RSP: 0000:ffff880104187960 EFLAGS: 00010086 [ 6.891000] RAX: 0000000000000050 RBX: ffffffff9764c570 RCX: 0000000000000000 [ 6.891000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880104178ca8 [ 6.891000] RBP: 1ffff10020830f34 R08: ffff8807ce68a1d0 R09: fffffbfff2923554 [ 6.891000] R10: fffffbfff2923554 R11: ffffffff9491aaa3 R12: ffff880104178000 [ 6.891000] R13: ffffffff96c809b8 R14: 000000000000a370 R15: ffff8807ce68a1c0 [ 6.891000] FS: 0000000000000000(0000) GS:ffff8807d4200000(0000) knlGS:0000000000000000 [ 6.891000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.891000] CR2: 0000000000000000 CR3: 000000028de16001 CR4: 00000000007606e0 [ 6.891000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6.891000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6.891000] PKRU: 00000000 [ 6.891000] Call Trace: [ 6.891000] ? debug_object_fixup+0x30/0x30 [ 6.891000] ? _raw_spin_unlock_irqrestore+0x4b/0x60 [ 6.891000] ? __lockdep_init_map+0x12f/0x510 [ 6.891000] ? __lockdep_init_map+0x12f/0x510 [ 6.891000] virt_efi_get_next_variable+0xa2/0x160 [ 6.891000] efivar_init+0x1c4/0x6d7 [ 6.891000] ? efivar_ssdt_setup+0x3b/0x3b [ 6.891000] ? efivar_entry_iter+0x120/0x120 [ 6.891000] ? find_held_lock+0x3a/0x1c0 [ 6.891000] ? lock_downgrade+0x5e0/0x5e0 [ 6.891000] ? kmsg_dump_rewind_nolock+0xd9/0xd9 [ 6.891000] ? _raw_spin_unlock_irqrestore+0x4b/0x60 [ 6.891000] ? trace_hardirqs_on_caller+0x381/0x570 [ 6.891000] ? efivar_ssdt_iter+0x1f4/0x1f4 [ 6.891000] efisubsys_init+0x1be/0x4ae [ 6.891000] ? kernfs_get.part.8+0x4c/0x60 [ 6.891000] ? efivar_ssdt_iter+0x1f4/0x1f4 [ 6.891000] ? __kernfs_create_file+0x235/0x2e0 [ 6.891000] ? efivar_ssdt_iter+0x1f4/0x1f4 [ 6.891000] do_one_initcall+0xe9/0x5fd [ 6.891000] ? perf_trace_initcall_level+0x450/0x450 [ 6.891000] ? __wake_up_common+0x5a0/0x5a0 [ 6.891000] ? lock_downgrade+0x5e0/0x5e0 [ 6.891000] kernel_init_freeable+0x51a/0x5f2 [ 6.891000] ? start_kernel+0x7b8/0x7b8 [ 6.891000] ? finish_task_switch+0x19a/0x690 [ 6.891000] ? __switch_to_asm+0x40/0x70 [ 6.891000] ? __switch_to_asm+0x34/0x70 [ 6.891000] ? rest_init+0xe9/0xe9 [ 6.891000] kernel_init+0xc/0x110 [ 6.891000] ? rest_init+0xe9/0xe9 [ 6.891000] ret_from_fork+0x24/0x50 [ 6.891000] irq event stamp: 1081352 [ 6.891000] hardirqs last enabled at (1081351): [<ffffffff93af7dab>] _raw_spin_unlock_irqrestore+0x4b/0x60 [ 6.891000] hardirqs last disabled at (1081352): [<ffffffff93af85c2>] _raw_spin_lock_irqsave+0x22/0x81 [ 6.891000] softirqs last enabled at (1081334): [<ffffffff93e006f9>] __do_softirq+0x6f9/0xaa0 [ 6.891000] softirqs last disabled at (1081325): [<ffffffff921b993f>] irq_exit+0x27f/0x2d0 [ 6.891000] ---[ end trace 15e1083fc009a526 ]--- All the messages above were printed while holding a raw spinlock with IRQ disabled. Further down the bootup sequence, the system appeared to hang: 11.270654] systemd[1]: systemd 239 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=legacy) [ 11.311307] systemd[1]: Detected architecture x86-64. [ 11.316420] systemd[1]: Running in initial RAM disk. Welcome to The system is not responsive at this point. I am not totally sure what caused this. Maybe it was caused by disabling IRQ for too long leading to some kind of corruption. Anyway, moving debug_object_is_on_stack() outside of the IRQ disabled lock critical section seemed to fix the hang problem. Cheers, Longman