On 23.03.2015 07:14, Carsten Emde wrote: > Hi Michel, > >>>>> [..] >>>>> The most striking problem of kernel 3.18.9-rt4 affects all systems >>>>> that >>>>> are equipped with Radeon graphics (irrespective whether PCIe cards or >>>>> APUs with on-chip graphics). They suffer from a hanging radeon driver. >>>>> The block occurs when accelerated graphics load is created by >>>>> x11perf or >>>>> gltestperf. Sometimes only the graphics are frozen while ssh login >>>>> still >>>>> is possible, somtimes the entire box is no longer accessible at >>>>> all. In >>>>> any case, a reboot is needed to recover from this situation. >>>>> >>>>> Here is a selection of kernel messages: >>>> [...] >>>> The commits from >>>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=f957063fee6392bb9365370db6db74dc0b2dce0a >>>> >>>> >>>> to >>>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=cffefd9bb31cd35ab745d3b49005d10616d25bdc >>>> >>>> >>>> and >>>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=b6610101718d4ab90d793c482625e98eb1262cad >>>> >>>> >>>> might help for this. >>> >>> Thanks a lot. I have applied these patches to a number of systems: >>> # quilt applied | tail -7 >>> patches/drm-radeon-do-a-posting-read-in-r100_set_irq.patch >>> patches/drm-radeon-do-a-posting-read-in-rs600_set_irq.patch >>> patches/drm-radeon-do-a-posting-read-in-r600_set_irq.patch >>> patches/drm-radeon-do-a-posting-read-in-evergreen_set_irq.patch >>> patches/drm-radeon-do-a-posting-read-in-si_set_irq.patch >>> patches/drm-radeon-do-a-posting-read-in-cik_set_irq.patch >>> patches/drm-radeon-fix-wait-to-actually-occur-after-the-signaling-callback.patch >>> >>> >>> >>> The graphic boards still crash and freeze the screen, but in contrast >>> to the earlier situation the systems remain accessible, and the X >>> Window server can be restarted after the offensive programs are >>> removed. The crashes were reliably triggered by >>> - gltestperf >>> or >>> - x11perf -repeat 3 -subs 25 -time 2 -rect10 > This is not entirely correct, since gltestperf does not reliably crash > the graphics controller. However, "x11perf -repeat 3 -subs 25 -time 2 > -rect10" always does a reliable job to trigger the crash. > >>> but the crashes also occur several times per day during normal work >>> such as browsing the Internet or writing a text document. If you wish >>> me to provide additional diagnostic information such as running test >>> programs while the graphic boards are unresponsive, I certainly can do >>> that. >> >> Does it also happen with a kernel built from a current drm-fixes tree? >> http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes > No. Apparently, you need full preemption to expose the problem. > > The following list contains the results whether the command "x11perf > -repeat 3 -subs 25 -time 2 -rect10" freezes the Radeon board under test > (Radeon HD 7970 XFS / R9 280X) or not: > linux-3.12.33-rt47 no > linux-3.14.34-rt32 no > linux-3.14.34-drm-3.16.7-rt32* no > linux-3.18.7-rt1 YES > linux-3.18.9-rt4 YES > linux-3.18.9-rt5 YES > linux-3.18.9-drm-3.16.7-rt5** no > linux-4.0.0-rc4 no > linux-drm-fixes no > *DRM subsystem backported from linux-3.16.7 to linux-3.14.34-rt32. > **DRM subsystem ported from linux-3.16.7 to linux-3.18.9-rt5. Can you test a non-rt 3.18.y kernel? There were some intermittent issues around 3.18 fixed by the patches I referenced above. Maybe I missed some other fixes, though. Maarten, do you remember any other fixes offhand that might help? > More observations: > If full function tracing is enabled (which makes the system about five > times slower), the graphics controller no longer freezes. With partial > function tracing such as "echo *drm* >set_ftrace_filter", the > controller still freezes. The trace then contains vblank interrupt > processing only, ioctls are no longer executed. > > This is the location where the driver hangs: > [25104.509258] INFO: task Xorg.bin:16591 blocked for more than 120 seconds. > [25104.516322] Not tainted 3.18.9-rt5 #2 > [25104.520715] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [25104.528853] Xorg.bin D ffffffff8171ed90 0 16591 16239 > 0x10400080 > [25104.536102] ffff8800ba0bb8d8 0000000000000002 ffff8800ba0bbfd8 > 0000000000000006 > [25104.536103] 000000000000dc08 ffff880626d0dc08 ffff8800ba0bbfd8 > 000000000000dc08 > [25104.536104] ffff88061b2cdcd0 ffff880616d3a940 ffff880035c10000 > ffff880616d3a940 > [25104.559274] Call Trace: > [25104.561844] [<ffffffff8171bb54>] schedule+0x34/0xa0 > [25104.561846] [<ffffffff8171e2ac>] schedule_timeout+0x23c/0x2a0 > [25104.561870] [<ffffffffa00e3ab6>] ? radeon_fence_process+0x16/0x40 > [radeon] > [25104.561879] [<ffffffffa00e3b24>] ? > radeon_fence_any_seq_signaled+0x44/0x90 [radeon] > [25104.561887] [<ffffffffa00e3e97>] > radeon_fence_wait_seq_timeout.constprop.8+0x327/0x380 [radeon] > [25104.561889] [<ffffffff810d19c0>] ? __wake_up_sync+0x20/0x20 > [25104.561898] [<ffffffffa00e4287>] radeon_fence_wait_any+0x57/0x70 > [radeon] > [25104.561914] [<ffffffffa015a36f>] radeon_sa_bo_new+0x2af/0x4b0 [radeon] > [25104.561916] [<ffffffff81379b07>] ? debug_smp_processor_id+0x17/0x20 > [25104.561918] [<ffffffff811d0b4a>] ? __kmalloc+0x8a/0x300 > [25104.561932] [<ffffffffa01b2197>] radeon_ib_get+0x37/0xe0 [radeon] > [25104.561943] [<ffffffffa01003ee>] radeon_cs_ioctl+0x22e/0x860 [radeon] > [25104.561952] [<ffffffffa0005bc7>] drm_ioctl+0x197/0x670 [drm] > [25104.561954] [<ffffffff81379b07>] ? debug_smp_processor_id+0x17/0x20 > [25104.561956] [<ffffffff810901ba>] ? unpin_current_cpu+0x1a/0x80 > [25104.561959] [<ffffffff810ba200>] ? migrate_enable+0x90/0x1a0 > [25104.561966] [<ffffffffa00c604c>] radeon_drm_ioctl+0x4c/0x80 [radeon] > [25104.561967] [<ffffffff811fdb88>] do_vfs_ioctl+0x2c8/0x4c0 > [25104.561969] [<ffffffff81208a92>] ? __fget+0x72/0xb0 > [25104.561970] [<ffffffff811fde01>] SyS_ioctl+0x81/0xa0 > [25104.561971] [<ffffffff8171f99e>] tracesys_phase2+0xd4/0xd9 > > Conclusion: > An upgrade change of the DRM subsystem between 3.16.7 and 3.18.9 > introduced a race condition that freezes Radeon graphics. It requires > full preemption to be exposed reliably. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel