On 9/21/23 23:06, Dmitry Osipenko wrote: > On 9/19/23 05:59, David Stevens wrote: >> On Mon, Sep 18, 2023 at 8:19 PM Dmitry Osipenko >> <dmitry.osipenko@xxxxxxxxxxxxx> wrote: >>> >>> On 9/18/23 12:58, Dmitry Osipenko wrote: >>>> On 9/11/23 05:16, David Stevens wrote: >>>>> From: David Stevens <stevensd@xxxxxxxxxxxx> >>>>> >>>>> Handle non-refcounted pages in __kvm_faultin_pfn. This allows the host >>>>> to map memory into the guest that is backed by non-refcounted struct >>>>> pages - for example, the tail pages of higher order non-compound pages >>>>> allocated by the amdgpu driver via ttm_pool_alloc_page. >>>>> >>>>> The bulk of this change is tracking the is_refcounted_page flag so that >>>>> non-refcounted pages don't trigger page_count() == 0 warnings. This is >>>>> done by storing the flag in an unused bit in the sptes. There are no >>>>> bits available in PAE SPTEs, so non-refcounted pages can only be handled >>>>> on TDP and x86-64. >>>>> >>>>> Signed-off-by: David Stevens <stevensd@xxxxxxxxxxxx> >>>>> --- >>>>> arch/x86/kvm/mmu/mmu.c | 52 +++++++++++++++++++++++---------- >>>>> arch/x86/kvm/mmu/mmu_internal.h | 1 + >>>>> arch/x86/kvm/mmu/paging_tmpl.h | 8 +++-- >>>>> arch/x86/kvm/mmu/spte.c | 4 ++- >>>>> arch/x86/kvm/mmu/spte.h | 12 +++++++- >>>>> arch/x86/kvm/mmu/tdp_mmu.c | 22 ++++++++------ >>>>> include/linux/kvm_host.h | 3 ++ >>>>> virt/kvm/kvm_main.c | 6 ++-- >>>>> 8 files changed, 76 insertions(+), 32 deletions(-) >>>> >>>> Could you please tell which kernel tree you used for the base of this >>>> series? This patch #6 doesn't apply cleanly to stable/mainline/next/kvm >>>> >>>> error: sha1 information is lacking or useless (arch/x86/kvm/mmu/mmu.c). >>>> error: could not build fake ancestor >>> >>> I applied the patch manually to v6.5.2 and tested Venus using Intel TGL iGPU, the intel driver is crashing: >>> >>> BUG: kernel NULL pointer dereference, address: 0000000000000058 >>> #PF: supervisor read access in kernel mode >>> #PF: error_code(0x0000) - not-present page >>> PGD 0 P4D 0 >>> Oops: 0000 [#1] PREEMPT SMP >>> CPU: 1 PID: 5926 Comm: qemu-system-x86 Not tainted 6.5.2+ #114 >>> Hardware name: LENOVO 20VE/LNVNB161216, BIOS F8CN43WW(V2.06) 08/12/2021 >>> RIP: 0010:gen8_ppgtt_insert+0x50b/0x8f0 >>> Code: 00 00 f7 c2 00 00 20 00 74 15 f7 c3 ff ff 1f 00 75 0d 41 81 fc ff ff 1f 00 0f 87 0e 02 00 00 48 8b 74 24 08 44 89 c0 45 85 ed <48> 8b 4e 58 48 8b 04 c1 0f 85 0b 02 00 00 81 e2 00 00 01 00 0f 84 >>> RSP: 0018:ffffafc085afb820 EFLAGS: 00010246 >>> RAX: 0000000000000000 RBX: 00000000e9604000 RCX: 000000000000001b >>> RDX: 0000000000211000 RSI: 0000000000000000 RDI: ffff9513d44c1000 >>> RBP: ffff951106f8dfc0 R08: 0000000000000000 R09: 0000000000000003 >>> R10: 0000000000000fff R11: 00000000e9800000 R12: 00000000001fc000 >>> R13: 0000000000000000 R14: 0000000000001000 R15: 0000ffff00000000 >>> FS: 00007f2a5bcced80(0000) GS:ffff951a87a40000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 0000000000000058 CR3: 0000000116f16006 CR4: 0000000000772ee0 >>> PKRU: 55555554 >>> Call Trace: >>> <TASK> >>> ? __die+0x1f/0x60 >>> ? page_fault_oops+0x14d/0x420 >>> ? exc_page_fault+0x3d7/0x880 >>> ? lock_acquire+0xc9/0x290 >>> ? asm_exc_page_fault+0x22/0x30 >>> ? gen8_ppgtt_insert+0x50b/0x8f0 >>> ppgtt_bind_vma+0x4f/0x60 >>> fence_work+0x1b/0x70 >>> fence_notify+0x8f/0x130 >>> __i915_sw_fence_complete+0x58/0x230 >>> i915_vma_pin_ww+0x513/0xa80 >>> eb_validate_vmas+0x17e/0x9e0 >>> ? eb_pin_engine+0x2bb/0x340 >>> i915_gem_do_execbuffer+0xc85/0x2bf0 >>> ? __lock_acquire+0x3b6/0x21c0 >>> i915_gem_execbuffer2_ioctl+0xee/0x240 >>> ? i915_gem_do_execbuffer+0x2bf0/0x2bf0 >>> drm_ioctl_kernel+0x9d/0x140 >>> drm_ioctl+0x1dd/0x410 >>> ? i915_gem_do_execbuffer+0x2bf0/0x2bf0 >>> ? __fget_files+0xc5/0x170 >>> __x64_sys_ioctl+0x8c/0xc0 >>> do_syscall_64+0x34/0x80 >>> entry_SYSCALL_64_after_hwframe+0x46/0xb0 >>> RIP: 0033:0x7f2a60b0c9df >>> >>> >>> $ ./scripts/faddr2line ./vmlinux gen8_ppgtt_insert+0x50b/0x8f0 >>> gen8_ppgtt_insert+0x50b/0x8f0: >>> i915_pt_entry at drivers/gpu/drm/i915/gt/intel_gtt.h:557 >>> (inlined by) gen8_ppgtt_insert_huge at drivers/gpu/drm/i915/gt/gen8_ppgtt.c:641 >>> (inlined by) gen8_ppgtt_insert at drivers/gpu/drm/i915/gt/gen8_ppgtt.c:743 >>> >>> It's likely should be the i915 driver issue that is crashes with the NULL deref, but the origin of the bug should be the kvm page fault handling. >>> >>> David, could you please tell what tests you've run and post a link to yours kernel tree? Maybe I made obscure mistake while applied the patch manually. >> >> For tests, I ran the kvm selftests and then various ChromeOS >> virtualization tests. Two things to note about the ChromeOS >> virtualization tests are that they use crosvm instead of qemu, and >> they use virtio-gpu+virgl for graphics in the guest. I tested on an >> AMD device (since the goal of this series is to fix a compatibility >> issue with the amdgpu driver), and on a TGL device. >> >> I don't have an easy way to share my kernel tree, but it's based on >> v6.5-r3. The series I sent out is rebased onto the kvm next branch, >> but there were only minor conflicts. > > It's good that you mentioned crosvm because I only tested qemu. > Interestingly, the problem doesn't present using crosvm. > > I made few more tests using these options: > > 1. yours latest version of the patch based on v6.5.4 > 2. yours latest version of the patch based on the kvm tree > 3. by forward-porting yours older version of the kvm patchset to v6.5 > that works well based on v6.4 kernel > > Neither of the options work with qemu and v6.5 kernel, but crosvm works. > > At first I felt confident that it must be i915 bug of the v6.5 kernel. > But given that crosvm works, now I'm not sure. > > Will try to bisect 6.5 kernel, at minimum the i915 driver changes. The bisection said that it's not i915 bug. Good news, the bug was fixed in a recent linux-next. Don't know what is the exact fix, suppose it's something in mm/ -- Best regards, Dmitry