openbsd 6.2 locking on apic related code on 4.13.8 / qemu-kvm 2.9.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

i've read https://www.linux-kvm.org/page/Bugs and i'm looking for the
best place (whether it's a qemu-kvm bug or a low-level kvm bug) to
report/find help debugging the following issue detailed here:
https://marc.info/?l=openbsd-bugs&m=151263207510100&w=2
(note that there are other reports which might not be the same thing, cf
https://marc.info/?l=openbsd-bugs&w=2&r=1&s=proxmox&q=b)

more details:
- the host is running proxmox 5.1/x86_64, up to date with kernel 4.13.8-2-pve
  and qemu-kvm 2.9.1. It has x2apic enabled in the bios, and the option
is recognized by the kernel.

$uname -a
Linux umbrail 4.13.8-2-pve #1 SMP PVE 4.13.8-28 (Wed, 29 Nov 2017 09:49:35 +0100) x86_64 GNU/Linux
$kvm --version
QEMU emulator version 2.9.1 pve-qemu-kvm_2.9.1-3

- the guest is running openbsd/amd64 6.2 with virtio drivers for
  network/disk, and a kernel rebuilt with option MP_LOCKDEBUG so that it
drops to the kernel debugger on mp delays/lockings

- the emulated cpu is either kvm64, either host (Xeon(R) CPU E5-2603 v4)

There are two kind of issues:
- hardlocks/deadlocks upon guest reboot, with the following trace inside
  openbsd's kernel (cpu is stuck waiting on this instruction)

  ddb{0}> tr
  x2apic_readreg(10) at x2apic_readreg+0xf
  lapic_delay(ffff800022136900) at lapic_delay+0x5c
  rtcput(ffff800022136960) at rtcput+0x65
  resettodr() at resettodr+0x1d6
  perform_resettodr(ffffffff81769b29) at perform_resettodr+0x9
  taskq_thread(0) at taskq_thread+0x67
  end trace frame: 0x0, count: -6

disabling x2apic (via -x2apic on the -cpu flags line) seemed to help for
a while, but the guest still locks sometimes upon reboot.

- hardlocks/deadlocks during normal operation, also hinting at apic:
ddb{1}> ps /o
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
 216909  39313    726           0  0x4000000    2  rrdcached
*366090  74010      0     0x14000      0x200    1  systq
ddb{1}> tr /p 0t366090
lapic_delay(ffffffff81a97c88) at lapic_delay+0x5c
rtcget(ffffffff81a97c88) at rtcget+0x1a
resettodr() at resettodr+0x3a
perform_resettodr(ffffffff8150a7d9) at perform_resettodr+0x9
taskq_thread(0) at taskq_thread+0x67
end trace frame: 0x0, count: -5

both issues seem to point out at problems in the apic emulation code.

OpenBSD developers think that's not an OpenBSD issue. Proxmox ppl would
say it's something in kvm, not in proxmox itself.

Some questions from that point:

- is it either low-level and kvm-related, or userspace/qemu related (and
  then, in which bugzilla should this be reported/tracked) ?
- does it ring a bell to anyone for a similar issue which would be
  already fixed in a more recent version of either the kernel, or kvm ?
- what would be the next steps ? collecting traces ? try other kvm cpu
  flags/options ?

I can of course either drop the guest OS into the interactive kernel
debugger to poke at values, or use gdb on the kvm process if that helps.

Note that the locks upon reboot are somewhat reproducible, but the locks
during normal operation are random, so i can't collect insane volumes of
traces.

Thanks for your attention,

Landry



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux