Hi, On 29.01.2025 15:09, Rodrigo Vivi wrote: > On Tue, Jan 28, 2025 at 08:54:10AM +0000, MARDI Youness wrote: >> Hello, >> >> >> >> Could you help us on this issue: >> [1]https://github.com/intel/linux-intel-lts/issues/54 Once you enabled all VFs, try to capture and attach to [1] all SRIOV provisioning details, you may use something like: $ grep . -r /sys/class/drm/card0/iov Also attach full dmesg and GuC log right after the failure. For larger GuC log buffer please select CONFIG_DRM_I915_DEBUG_GUC and use modparam i915.guc_log_level=4 You can also try with (once VFs are enabled, but before starting VMs): - set explicit "execution_quantum_ms" for PF and all VFs to 20 - set explicit "preemption_timeout_us" for PF and all VFs to 20000 - enable "engine_reset" policy $ echo 20 > /sys/class/drm/card0/iov/pf/gt0/execution_quantum_ms $ echo 20 > /sys/class/drm/card0/iov/vf1/gt0/execution_quantum_ms ... $ echo 1 > /sys/class/drm/card0/iov/pf/gt0/policies/engine_reset >> >> >> >> Host environment >> >> Operating system: Gentoo Base System release 2.14 >> OS/kernel version: >> [2]https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z > > https://github.com/intel/linux-intel-lts/blob/lts-v6.6.34-linux-240626T131354Z/drivers/gpu/drm/i915/README.sriov > > Michal, could you please help here? > > Thanks, > Rodrigo. > >> Architecture: x86_64 >> QEMU flavor: qemu-system-x86_64 >> QEMU version: latest qemu (master branch) >> CPU: 12th Gen Intel(R) Core(TM) i7-1270P >> igpu: Alder Lake-P >> firmware: >> [3]https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz >> >> >> >> Emulated/Virtualized environment >> >> Operating system: Windows 10 21H1 >> >> >> >> >> >> Description of problem >> >> After setting up SR-IOV (kernel compilation, kernel cmdline, vfio-pci >> driver attribution to the new pci..) >> I've got my two new pci. >> >> >> >> >> >> 00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P >> Integrated Graphics Controller (rev 0c) >> >> DeviceName: Onboard IGD >> >> >> >> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics >> Controller >> >> Kernel driver in use: i915 >> >> >> >> 00:02.1 VGA compatible controller: Intel Corporation Alder Lake-P >> Integrated Graphics Controller (rev 0c) >> >> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics >> Controller >> >> Kernel driver in use: vfio-pci >> >> >> >> 00:02.2 VGA compatible controller: Intel Corporation Alder Lake-P >> Integrated Graphics Controller (rev 0c) >> >> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics >> Controller >> >> Kernel driver in use: vfio-pci >> >> >> >> I gave one of those pci to my VM with this qemu cmdline: >> >> >> >> -cpu >> host,migratable=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-passthrough,hv-vendor-id=IrisXE >> >> ... >> >> -device >> vfio-pci-nohotplug,host=0000:00:02.1,id=hostdev0,bus=pci.4,addr=0x0 >> >> >> >> Sometimes it working properly when I start the qemu cmdline but most of >> the time I've got those kernel errors and a GPU hang: >> >> >> >> kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB >> invalidation response timed out for seqno 9679 >> >> kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB >> invalidation response timed out for seqno 9679 >> >> kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation >> response timed out for seqno 9679 >> >> kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation >> response timed out for seqno 9679 >> >> .... >> >> kernel Fence expiration time out >> i915-0000:00:02.0:renderThread22381:6e0! >> >> kernel i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin >> version 70.13.1 >> >> kernel i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin >> version 7.9.3 >> >> kernel i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all >> workloads >> >> kernel i915 0000:00:02.0: [drm] GT0: GUC: submission enabled >> >> kernel i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled >> >> kernel [ 2730.991019] i915 0000:00:02.0: [drm] GPU HANG: ecode >> 12:1:85dfbfff, in renderThread [22381] >> >> kernel [ 2730.991084] i915 0000:00:02.0: [drm] renderThread22381 >> context reset due to GPU hang >> >> >> >> >> >> It mostly appears when Qemu is starting.. >> Any help would be appreciated, thanks a lot >> >> >> >> Best Regards, >> >> >> >> Youness MARDI >> >> >> >> C2 – Usage restreint >> >> References >> >> Visible links >> 1. https://github.com/intel/linux-intel-lts/issues/54 >> 2. https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z >> 3. https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz