Anyone had time to take a little look at the kernel trace?
On 2017-11-15 04:08 PM, David Hill wrote:
Hi guys,
The same behavior is seen with 4.15-rc0 :
$ sudo virsh list
Id Name State
----------------------------------------------------
2 undercloud-0-pike in shutdown
$ uname -a
Linux zappa.orion 4.15.0-0.rc0.git1.1.fc28.x86_64 #1 SMP Mon Nov 13
19:54:17 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ ps aufxg | grep qemu
qemu 8595 81.6 12.8 22831200 16961240 ? D 11:44 214:26
/usr/bin/qemu-system-x86_64 -machine accel=kvm -name
guest=undercloud-0-pike,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-undercloud-0-pike/master-key.aes
-machine
pc-i440fx-2.10,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu
Westmere -m 16384 -realtime mlock=off -smp
4,sockets=4,cores=1,threads=1 -uuid
0de0918d-f67e-426a-91c1-5e38e86c96b3 -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-undercloud-0-pike/monitor.sock,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc
base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet
-no-shutdown -global PIIX4_PM.disable_s3=1 -global
PIIX4_PM.disable_s4=1 -boot menu=on,strict=on -device
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6
-device
ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1
-device
ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
file=/var/lib/jenkins/VMs/undercloud-0-pike.qcow2,format=qcow2,if=none,id=drive-virtio-disk0
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-drive if=none,id=drive-ide0-0-0,readonly=on -device
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=3
-netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:79:8e:41,bus=pci.0,addr=0x3,bootindex=1
-netdev tap,fd=31,id=hostnet1,vhost=on,vhostfd=32 -device
virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:b8:8b:a5,bus=pci.0,addr=0x8
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -chardev
spicevmc,id=charchannel0,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
-spice
port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on
-device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
-device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev
spicevmc,id=charredir0,name=usbredir -device
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 -chardev
spicevmc,id=charredir1,name=usbredir -device
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
Thank you very much,
David Hill
On 2017-11-14 10:44 AM, Dave Hill wrote:
I'm not able to bisect further as the system wont boot properly when
I reach that commit.
On Mon, Nov 13, 2017 at 10:54 AM, David Hill <dhill@xxxxxxxxxx
<mailto:dhill@xxxxxxxxxx>> wrote:
Hi guys,
Starting with kernel 4.14-rc1, my CI failed to completely
shutdown one of the
VMs and it stuck in "in shutdown" while sending this kernel message:
[ 7496.552971] INFO: task qemu-system-x86:5978 blocked for more
than 120 seconds.
[ 7496.552987] Tainted: G I
4.14.0-0.rc1.git3.1.fc28.x86_64 #1
[ 7496.552996] "echo 0 /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 7496.553006] qemu-system-x86 D12240 5978 1 0x00000004
[ 7496.553024] Call Trace:
[ 7496.553044] __schedule+0x2dc/0xbb0
[ 7496.553055] ? trace_hardirqs_on+0xd/0x10
[ 7496.553074] schedule+0x3d/0x90
[ 7496.553087] vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
[ 7496.553100] ? finish_wait+0x90/0x90
[ 7496.553115] vhost_net_ioctl+0x542/0x910 [vhost_net]
[ 7496.553144] do_vfs_ioctl+0xa6/0x6c0
[ 7496.553166] SyS_ioctl+0x79/0x90
[ 7496.553182] entry_SYSCALL_64_fastpath+0x1f/0xbe
[ 7496.553190] RIP: 0033:0x7fa1ea0e1817
[ 7496.553196] RSP: 002b:00007ffe3d854bc8 EFLAGS: 00000246
ORIG_RAX: 0000000000000010
[ 7496.553209] RAX: ffffffffffffffda RBX: 000000000000001d RCX:
00007fa1ea0e1817
[ 7496.553215] RDX: 00007ffe3d854bd0 RSI: 000000004008af30 RDI:
0000000000000021
[ 7496.553222] RBP: 000055e33352b610 R08: 000055e33024a6f0 R09:
000055e330245d92
[ 7496.553228] R10: 000055e33344e7f0 R11: 0000000000000246 R12:
000055e33351a000
[ 7496.553235] R13: 0000000000000001 R14: 0000000400000000 R15:
0000000000000000
[ 7496.553284]
Showing all locks held in the system:
[ 7496.553313] 1 lock held by khungtaskd/161:
[ 7496.553319] #0: (tasklist_lock){.+.+}, at:
[<ffffffff8111740d>] debug_show_all_locks+0x3d/0x1a0
[ 7496.553373] 1 lock held by in:imklog/1194:
[ 7496.553379] #0: (&f->f_pos_lock){+.+.}, at:
[<ffffffff8130ecfc>] __fdget_pos+0x4c/0x60
[ 7496.553541] 1 lock held by qemu-system-x86/5978:
[ 7496.553547] #0: (&dev->mutex#3){+.+.}, at:
[<ffffffffc077e498>] vhost_net_ioctl+0x358/0x910 [vhost_net]
I'm currently bisecting to figure out which commit breaks this but
for some reasons, when hitting this commit:
# bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag
'wireless-drivers-next-for-davem-2017-08-07' of
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
<http://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next>
git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63
the host will not allow SSHd to establish a new session and when
starting a KVM guest, the host will hard lock. I'm still
bisecting but I marked that commit as bad even though perhaps it
would be good.
Hopefully, this commit was a bad one and my bisection will
pinpoint which
commit broke the kernel. If you have an idea of which commit
might break
the system, please let me know which one I should test first.
Thank you very much,
David Hill
[1] https://bugzilla.kernel.org/show_bug.cgi?id=197861
<https://bugzilla.kernel.org/show_bug.cgi?id=197861>