[Bug 216645] New: Fence fallback timer expired on ring gfx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=216645

            Bug ID: 216645
           Summary: Fence fallback timer expired on ring gfx
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.15.0-43-generic
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@xxxxxxxxxxxxxxxxxxxx
          Reporter: ask4support@xxxxxxxx
        Regression: No

Created attachment 303109
  --> https://bugzilla.kernel.org/attachment.cgi?id=303109&action=edit
Kernel log created by the script in the menuetry

Sometimes when I run a KDE system monitor, or Chrome, my laptop freezes and
won't unfreeze until reboot (well, after a while I can move the mouse cursor,
but that's all I can do). 
I'm using Dell G5 SE 5505 with AMD Ryzen 7 4800H as a CPU, Radeon RX Vega 7 as
iGPU and AMD Radeon RX 5600M as dGPU. 

I've searched through existing bugs and found that it might be related to
interrupts. With that in mind, I've compiled a list of kernel parameters which
might be related and, as well as that, I've tested all of them: 

PW = Probably Working, NW = Not Working, NB = Not Booting
PW      pcie_port_pm=off
PW      amdgpu.msi=0
NW      amd_iommu=fullflush
NW      amd_iommu=force_isolation
NW      amd_iommu=off
NW      amd_iommu_intr=legacy
NW      amd_iommu_intr=vapic kvm-amd.avic=1
NW      iommu=off
NW      iommu=force
NW      iommu=noforce
NW      iommu=biomerge
NW      iommu=merge
NW      iommu=nomerge
NW      iommu=forcesac
NW      iommu=soft
NW      iommu=pt
NW      irqfixup
NW      irqpoll
NW      nointremap
NW      pcie_port_pm=force
NW      amdgpu.pcie_gen2=1
NW      amdgpu.pcie_gen2=0
NW      amdgpu.msi=1
NW      amdgpu.lockup_timeout=1000
NW      amdgpu.lockup_timeout=100
NW      amdgpu.aspm=1
NW      amdgpu.aspm=0
NW      amdgpu.bapm=1
NW      amdgpu.bapm=0
NW      amdgpu.ppfeaturemask=0xfff7bff7
NW      amdgpu.ppfeaturemask=0xfff7bdff
NW      amdgpu.ppfeaturemask=0xfff7bbff
NW      amdgpu.ppfeaturemask=0xfff73fff
NW      amdgpu.ppfeaturemask=0xfff3bfff
NW      amdgpu.exp_hw_support=1
NW      amdgpu.exp_hw_support=0
NW      amdgpu.forcelongtraining=0
NW      amdgpu.forcelongtraining=1
NW      amdgpu.cg_mask=0x00000000
NW      amdgpu.cg_mask=0xffffffff
NW      amdgpu.pg_mask=0xffffffff
NW      amdgpu.ngg=1
NW      amdgpu.ngg=0
NW      amdgpu.job_hang_limit=1000
NW      amdgpu.job_hang_limit=100
NW      amdgpu.lbpw=1
NW      amdgpu.lbpw=0
NW      amdgpu.gpu_recovery=1
NW      amdgpu.gpu_recovery=0
NW      amdgpu.sched_policy=2
NW      amdgpu.sched_policy=1
NW      amdgpu.sched_policy=0
NW      amdgpu.ignore_crat=0
NW      amdgpu.ignore_crat=1
NW      amdgpu.ras_enable=0
NW      amdgpu.ras_enable=1
NW      amdgpu.async_gfx_ring=0
NW      amdgpu.async_gfx_ring=1
NW      amdgpu.mcbp=1
NW      amdgpu.mcbp=0
NW      amdgpu.mes=0
NW      amdgpu.mes_kiq=1
NW      amdgpu.mes_kiq=0
NW      amdgpu.reset_method=0
NW      amdgpu.reset_method=1
NW      amdgpu.reset_method=2
NW      amdgpu.reset_method=3
NW      amdgpu.reset_method=4
NW      amdgpu.reset_method=-1
NW      idle=nomwait
NB      amdgpu.pg_mask=0x00000000
NB      amdgpu.mes=1



I've developed a script and a GRUB2 menu entry for live Kubuntu that triggers
the freeze and saves the dmesg into a file called Freeze_Dell_G5_SE_5505.sh.log
at the root of the drive it's being booted from.
Replace the ISO variable value with the path to your iso file if it's not at
root directory of the drive and/or if it's of a different version: 

menuentry "Start Kubuntu 22.04.1 (64 bit) without Ubiquity and with a freezing
script" {
        ISO=/kubuntu-22.04.1-desktop-amd64.iso
        set gfxpayload=keep
        loopback loop "$ISO"
        probe -u $root --set=rootid
        linux   (loop)/casper/vmlinuz   iso-scan/filename="$ISO"
file=/cdrom/preseed/kubuntu.seed maybe-ubiquity quiet splash init=/bin/sh -- -c
'for script in /home/kubuntu/Desktop/Freeze_Dell_G5_SE_5505.sh ; do for autorun
in /home/kubuntu/.config/autostart/${script##*/} ; do ln -fs /dev/null
/etc/systemd/system/graphical.target.wants/ubiquity.service ; mkdir -p
${script%/*} ${autorun%/*} ; printf
\043!_/bin/sh++print\050\051_{+\tprintf_"@1"_,_seq_-s"_"_@\050\050_@\050stty_size_\074_@t_?_sed_"s/^/\050/,_s/_/_-_1_\051_*_/"\051_-_@{\0431}_\051\051_?_sed_s/[0-9]//g+}+t\075"@\050readlink_/proc/self/fd/0\051"++d\075"@\050env_LANG\075C_udisksctl_mount_-b_/dev/disk/by-uuid/$0_-o_sync_2\076_/dev/null_?_sed_"s/^Mounted_.*_at_//g,_s/\\.@//g"\051"+[_-d_"@d"_]_\046\046_f\075oflag\075direct_??_d\075"@{0%%/*}"+sudo_dmesg_-w_?_sudo_dd_of\075"@d/@{0\043\043*/}.log"_@f_\046+i\0750+seq_28_150000_?_while_read_N_,_do+\tprint_@N+\ttimeout_3_env_DISPLAY\075:0_plasma-systemmonitor_\076_/dev/null_2\076\0461+\tn\075@N_,_while_[_0_-lt_@n_]_,_do+\t\tsleep_1+\t\tn\075@\050\050_@n_-_1_\051\051+\t\ti\075@\050\050_@i_^_1_\051\051+\t\t[_"@i"_\075_1_]_\046\046_printf_"\\33[30m\\33[47m"_??_printf_"\\33[37m\\33[40m"+\t\tprint_@n+\tdone+done++echo_END!+exit+
| tr _,?@+ \40\73\174\044\n > $script ; printf
[Desktop_Entry]\nType=Application\nExec=kstart_--maximize_--_konsole_-e_  | tr
_ \40 > ${autorun%.sh}.desktop ; printf $script\n >> ${autorun%.sh}.desktop ;
chmod +x $script ${autorun%.sh}.desktop ; chown -R kubuntu:kubuntu
/home/kubuntu ; exec /sbin/init maybe-ubiquity splash --- ; done ; done'
$rootid
        initrd  (loop)/casper/initrd
}



The script generated on the live Kubuntu's desktop runs KDE's System Monitor
for a three seconds and waits before running it again. With each iteration, it
waits one second longer than before. The parameter passed the test if it
managed not to freeze until the script was waiting for 50 seconds (now I'd
recommend 60, as with 50 it sometimes froze after the second boot) for five
boots in a row. 

Would someone also tell us which workaround should be used under which
performace/latency requirements? ("Maybe wrong but still an" EXAMPLE: Users who
need the best performace or lowest latency should use pcie_port_pm=off, users
who need the best battery life should use amdgpu.msi=0.)

If you fix the issue, may you please tell the users (not just developers) what
was the problem? ("Maybe wrong but still an" EXAMPLE: The driver was waiting
for an interrupt, but the bus was down, therefore the message-signalled
interrupt could not have come and the operation timed out.)

Thanks.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux