On Tue, 2022-04-26 at 19:39 +0300, Maxim Levitsky wrote: > Just for reference, noticed today: > > [ 1355.788381] ------------[ cut here ]------------ > [ 1355.788820] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000cce298a6>] kvm_vcpu_block+0x54/0xa0 [kvm] > [ 1355.789843] WARNING: CPU: 0 PID: 8927 at kernel/sched/core.c:9662 __might_sleep+0x68/0x70 > [ 1355.790573] Modules linked in: kvm_amd(O) ccp kvm(O) irqbypass tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT uinput snd_seq_dummy snd_hrtimer ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat bridge rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace rfkill sunrpc vfat fat snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core rng_core snd_seq snd_seq_device input_leds snd_pcm joydev snd_timer snd lpc_ich mfd_core virtio_input efi_pstore pcspkr rtc_cmos button sch_fq_codel ext4 mbcache jbd2 hid_generic usbhid hid virtio_gpu virtio_dma_buf drm_shmem_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_blk virtio_net drm net_failover virtio_console failover i2c_core crc32_pclmul crc32c_intel xhci_pci xhci_hcd virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring dm_mirror dm_region_hash dm_log fuse ipv6 autofs4 [last unloaded: ccp] > [ 1355.797870] CPU: 0 PID: 8927 Comm: CPU 0/KVM Tainted: G W O 5.18.0-rc4.unstable #5 > [ 1355.798637] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > [ 1355.799295] RIP: 0010:__might_sleep+0x68/0x70 > [ 1355.799730] Code: e8 1d fe ff ff 41 5c 41 5d 5d c3 c6 05 5c b3 4c 01 01 90 48 8b 90 80 14 00 00 48 c7 c7 a8 4e 2c 82 48 89 d1 e8 b4 f8 8a 00 90 <0f> 0b 90 90 eb c8 66 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 be 08 > [ 1355.801313] RSP: 0018:ffffc90002183be8 EFLAGS: 00010282 > [ 1355.801783] RAX: 0000000000000000 RBX: ffffc90002183c40 RCX: 0000000000000000 > [ 1355.802402] RDX: ffff88846ce2c1c0 RSI: ffff88846ce1f7a0 RDI: ffff88846ce1f7a0 > [ 1355.803046] RBP: ffffc90002183bf8 R08: ffff88847df6efe8 R09: 0000000000013ffb > [ 1355.803690] R10: 0000000000000553 R11: ffff88847df0f000 R12: ffffffffa0921898 > [ 1355.804294] R13: 00000000000000a1 R14: 00000000003e0688 R15: ffffea0000000000 > [ 1355.804901] FS: 00007f630ffff640(0000) GS:ffff88846ce00000(0000) knlGS:0000000000000000 > [ 1355.805641] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1355.806145] CR2: 0000000000000000 CR3: 000000011dc16000 CR4: 0000000000350ef0 > [ 1355.806759] Call Trace: > [ 1355.806971] <TASK> > [ 1355.807187] kvm_vcpu_map+0x159/0x190 [kvm] > [ 1355.807628] nested_svm_vmexit+0x4c/0x7f0 [kvm_amd] > [ 1355.808036] ? kvm_vcpu_block+0x54/0xa0 [kvm] > [ 1355.808450] svm_check_nested_events+0x97/0x390 [kvm_amd] > [ 1355.808920] kvm_check_nested_events+0x1c/0x40 [kvm] > [ 1355.809396] kvm_arch_vcpu_runnable+0x4e/0x190 [kvm] > [ 1355.809892] kvm_vcpu_check_block+0x4f/0x100 [kvm] > [ 1355.810349] ? kvm_vcpu_check_block+0x5/0x100 [kvm] > [ 1355.810806] ? kvm_vcpu_block+0x54/0xa0 [kvm] > [ 1355.811259] kvm_vcpu_block+0x6b/0xa0 [kvm] > [ 1355.811666] kvm_vcpu_halt+0x3f/0x490 [kvm] > [ 1355.812049] kvm_arch_vcpu_ioctl_run+0xb0b/0x1d00 [kvm] > [ 1355.812539] ? rcu_read_lock_sched_held+0x16/0x80 > [ 1355.813013] ? lock_release+0x1c4/0x270 > [ 1355.813365] ? __wake_up_common+0x8d/0x180 > [ 1355.813743] ? _raw_spin_unlock_irq+0x28/0x40 > [ 1355.814139] kvm_vcpu_ioctl+0x289/0x750 [kvm] > [ 1355.814537] ? kvm_vcpu_ioctl+0x289/0x750 [kvm] > [ 1355.814963] ? lock_release+0x1c4/0x270 > [ 1355.815321] ? __fget_files+0xe1/0x1a0 > [ 1355.815690] __x64_sys_ioctl+0x8e/0xc0 > [ 1355.816019] do_syscall_64+0x36/0x80 > [ 1355.816336] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 1355.816776] RIP: 0033:0x7f632cc340ab > [ 1355.817109] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9d bd 0c 00 f7 d8 64 89 01 48 > [ 1355.818713] RSP: 002b:00007f630fffd5c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 1355.819381] RAX: ffffffffffffffda RBX: 0000560c199db3e0 RCX: 00007f632cc340ab > [ 1355.820006] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000024 > [ 1355.820610] RBP: 00007f630fffd6c0 R08: 0000560c18a71e60 R09: 000000000000ffff > [ 1355.821238] R10: 0000560c185cac1a R11: 0000000000000246 R12: 00007fff68dfeb0e > [ 1355.821866] R13: 00007fff68dfeb0f R14: 0000000000000000 R15: 00007f630ffff640 > [ 1355.822490] </TASK> > [ 1355.822690] irq event stamp: 0 > [ 1355.823011] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [ 1355.823817] hardirqs last disabled at (0): [<ffffffff811339cb>] copy_process+0x94b/0x1ec0 > [ 1355.824820] softirqs last enabled at (0): [<ffffffff811339cb>] copy_process+0x94b/0x1ec0 > [ 1355.825856] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [ 1355.826633] ---[ end trace 0000000000000000 ]--- > > > I never liked the fact that we map the vmcb12 on vmexit, we should keep it mapped from nested vmentry to nested vmexit. > I can make a patch for this. > > Also injecting #GP if either of the maps fail is no doubt wrong - something I know for a while but never got to fix it. Actually for vmrun, that #GP is I think sort of correct now - that is what AMD cpus do on 'invalid physical address', but for VM exit, we just need to have the vmcb mapped instead of mapping it again - injecting #GP at that point which will go to the nested guest is just wrong. Best regards, Maxim Levitsky > > Best regards, > Maxim Levitsky