Re: [REGRESSION][v6.8-rc1] virtio-pci: Introduce admin virtqueue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On a VM with the GCP kernel (where we first identified the problem), I see:

1. The full kernel log from `journalctl --system > kernlog` attached.  The specific suspend section is here:

May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal systemd[1]: Reached target sleep.target - Sleep.
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal systemd[1]: Starting systemd-suspend.service - System Suspend...
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal systemd-sleep[1413]: Performing sleep operation 'suspend'...
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: PM: suspend entry (deep)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: Filesystems sync: 0.008 seconds
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: Freezing user space processes
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: Freezing user space processes completed (elapsed 0.001 seconds)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: OOM killer disabled.
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: Freezing remaining freezable tasks
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: Freezing remaining freezable tasks completed (elapsed 0.000 seconds)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: printk: Suspending console(s) (use no_console_suspend to debug)
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: port 00:03:0.0: PM: dpm_run_callback(): pm_runtime_force_suspend+0x0/0x130 returns -16
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: port 00:03:0.0: PM: failed to suspend: error -16
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: sd 0:0:1:0: [sda] Synchronizing SCSI cache
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: PM: Some devices failed to suspend, or early wake event detected
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: OOM killer enabled.
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: Restarting tasks ... done.
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: random: crng reseeded on system resumption
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: PM: suspend exit
May 08 11:08:42 kernel-test-202405080702.c.ubuntu-catred.internal kernel: PM: suspend entry (s2idle)
-- Boot 61828bc938b44fc68a8aeedc16a23a9d --
May 08 11:09:03 localhost kernel: Linux version 6.8.0-1007-gcp (buildd@lcy02-amd64-079) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.2.0-23ubuntu4) 13.2.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #7-Ubuntu SMP Sat Apr 20 00:58:31 UTC 2024 (Ubuntu 6.8.0-1007.7-gcp 6.8.1)
May 08 11:09:03 localhost kernel: Command line: BOOT_IMAGE=/vmlinuz-6.8.0-1007-gcp root=PARTUUID=7a949935-6bf2-4cae-b404-803c95163572 ro console=ttyS0,115200 panic=-1

2. The features the devices has:

catred@kernel-test-202405080702:~$ cat /sys/bus/virtio/devices/virtio0/features
0110000000000000000000000000010000000000000000000000000000000000
catred@kernel-test-202405080702:~$ cat /sys/bus/virtio/devices/virtio1/features
1110010110011001110000100000010000000000000000000000000000000000
catred@kernel-test-202405080702:~$ cat /sys/bus/virtio/devices/virtio2/features
1110000000000000000000000000000000000000000000000000000000000000
catred@kernel-test-202405080702:~$ cat /sys/bus/virtio/devices/virtio3/features
0000000000000000000000000000000000000000000000000000000000000000

Catherine

On Tue, May 7, 2024 at 11:34 PM Jason Wang <jasowang@xxxxxxxxxx> wrote:
On Sat, May 4, 2024 at 2:10 AM Joseph Salisbury
<joseph.salisbury@xxxxxxxxxxxxx> wrote:
>
> Hi Feng,
>
> During testing, a kernel bug was identified with the suspend/resume
> functionality on instances running in a public cloud [0].  This bug is a
> regression introduced in v6.8-rc1.  After a kernel bisect, the following
> commit was identified as the cause of the regression:
>
>         fd27ef6b44be  ("virtio-pci: Introduce admin virtqueue")

Have a quick glance at the patch it seems it should not damage the
freeze/restore as it should behave as in the past.

But I found something interesting:

1) assumes 1 admin vq which is not what spec said
2) special function for admin virtqueue during freeze/restore, but it
doesn't do anything special than del_vq()
3) lack real users but I guess e.g the destroy_avq() needs to be
synchronized with the one that is using admin virtqueue

>
> I was hoping to get your feedback, since you are the patch author. Do
> you think gathering any additional data will help diagnose this issue?

Yes, please show us

1) the kernel log here.
2) the features that the device has like
/sys/bus/virtio/devices/virtio0/features

> This commit is depended upon by other virtio commits, so a revert test
> is not really straight forward without reverting all the dependencies.
> Any ideas you have would be greatly appreciated.

Thanks

>
>
> Thanks,
>
> Joe
>
> http://pad.lv/2063315
>

Attachment: kernlog20240508_0714
Description: Binary data


[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux