Re: netconsole deadlock with virtnet

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On (20/11/17 09:33), Steven Rostedt wrote:
> > [   21.149601]     IN-HARDIRQ-W at:
> > [   21.149602]                          __lock_acquire+0xa78/0x1a94
> > [   21.149603]                          lock_acquire.part.0+0x170/0x360
> > [   21.149604]                          lock_acquire+0x68/0x8c
> > [   21.149605]                          console_unlock+0x1e8/0x6a4
> > [   21.149606]                          vprintk_emit+0x1c4/0x3c4
> > [   21.149607]                          vprintk_default+0x40/0x4c
> > [   21.149608]                          vprintk_func+0x10c/0x220
> > [   21.149610]                          printk+0x68/0x90
> > [   21.149611]                          crng_fast_load+0x1bc/0x1c0
> > [   21.149612]                          add_interrupt_randomness+0x280/0x290
> > [   21.149613]                          handle_irq_event+0x80/0x120
> > [   21.149614]                          handle_fasteoi_irq+0xac/0x200
> > [   21.149615]                          __handle_domain_irq+0x84/0xf0
> > [   21.149616]                          gic_handle_irq+0xd4/0x320
> > [   21.149617]                          el1_irq+0xd0/0x180
> > [   21.149618]                          arch_cpu_idle+0x24/0x44
> > [   21.149619]                          default_idle_call+0x48/0xa0
> > [   21.149620]                          do_idle+0x260/0x300
> > [   21.149621]                          cpu_startup_entry+0x30/0x6c
> > [   21.149622]                          rest_init+0x1b4/0x288
> > [   21.149624]                          arch_call_rest_init+0x18/0x24
> > [   21.149625]                          start_kernel+0x5cc/0x608
> > [   21.149625]     IN-SOFTIRQ-W at:
> > [   21.149627]                          __lock_acquire+0x894/0x1a94
> > [   21.149628]                          lock_acquire.part.0+0x170/0x360
> > [   21.149629]                          lock_acquire+0x68/0x8c
> > [   21.149630]                          console_unlock+0x1e8/0x6a4
> > [   21.149631]                          vprintk_emit+0x1c4/0x3c4
> > [   21.149632]                          vprintk_default+0x40/0x4c
> > [   21.149633]                          vprintk_func+0x10c/0x220
> > [   21.149634]                          printk+0x68/0x90
> > [   21.149635]                          hrtimer_interrupt+0x290/0x294
> > [   21.149636]                          arch_timer_handler_virt+0x3c/0x50
> > [   21.149637]                          handle_percpu_devid_irq+0x94/0x164
> > [   21.149673]                          __handle_domain_irq+0x84/0xf0
> > [   21.149674]                          gic_handle_irq+0xd4/0x320
> > [   21.149675]                          el1_irq+0xd0/0x180
> > [   21.149676]                          __do_softirq+0x108/0x638
> > [   21.149677]                          __irq_exit_rcu+0x17c/0x1b0
> > [   21.149678]                          irq_exit+0x18/0x44
> > [   21.149679]                          __handle_domain_irq+0x88/0xf0
> > [   21.149680]                          gic_handle_irq+0xd4/0x320
> > [   21.149681]                          el1_irq+0xd0/0x180
> > [   21.149682]                          smp_call_function_many_cond+0x3cc/0x3f0
> > [   21.149683]                          kick_all_cpus_sync+0x4c/0x80
> > [   21.149684]                          load_module+0x1eec/0x2734
> > [   21.149685]                          __do_sys_finit_module+0xbc/0x12c
> > [   21.149686]                          __arm64_sys_finit_module+0x28/0x34
> > [   21.149687]                          el0_svc_common.constprop.0+0x84/0x200
> > [   21.149688]                          do_el0_svc+0x2c/0x90
> > [   21.149689]                          el0_svc+0x18/0x50
> > [   21.149690]                          el0_sync_handler+0xe0/0x350
> > [   21.149691]                          el0_sync+0x158/0x180

[..]

> It really sucks that we lose 190 messages that would help to decipher this
> more. :-p

Indeed.

> Because I'm not sure where the xmit_lock is taken while holding the
> target_list_lock.

I don't see where does this happen. It seems to me that the report
is not about broken locking order, but more about:
- soft-irq can be preempted (while holding _xmit_lock) by a hardware
  interrupt, that will attempt to acquire the same _xmit_lock lock.

   CPU0
   <<soft IRQ>>
    virtnet_poll_tx()
     __netif_tx_lock()
      spin_lock(_xmit_lock)
   <<hard IRQ>>
    add_interrupt_randomness()
     crng_fast_load()
      printk()
       call_console_drivers()
        spin_lock_irqsave(&target_list_lock)
	 spin_lock(_xmit_lock);

Does this make sense?

	-ss
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization



[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux