On 2020-03-13 1:35 p.m., Kazlauskas, Nicholas wrote: > On 2020-03-12 10:32 a.m., Alex Deucher wrote: >> On Thu, Mar 5, 2020 at 4:21 PM Mario Kleiner >> <mario.kleiner.de@xxxxxxxxx> wrote: >>> >>> Commit '16f17eda8bad ("drm/amd/display: Send vblank and user >>> events at vsartup for DCN")' introduces a new way of pageflip >>> completion handling for DCN, and some trouble. >>> >>> The current implementation introduces a race condition, which >>> can cause pageflip completion events to be sent out one vblank >>> too early, thereby confusing userspace and causing flicker: >>> >>> prepare_flip_isr(): >>> >>> 1. Pageflip programming takes the ddev->event_lock. >>> 2. Sets acrtc->pflip_status == AMDGPU_FLIP_SUBMITTED >>> 3. Releases ddev->event_lock. >>> >>> --> Deadline for surface address regs double-buffering passes on >>> target pipe. >>> >>> 4. dc_commit_updates_for_stream() MMIO programs the new pageflip >>> into hw, but too late for current vblank. >>> >>> => pflip_status == AMDGPU_FLIP_SUBMITTED, but flip won't complete >>> in current vblank due to missing the double-buffering deadline >>> by a tiny bit. >>> >>> 5. VSTARTUP trigger point in vblank is reached, VSTARTUP irq fires, >>> dm_dcn_crtc_high_irq() gets called. >>> >>> 6. Detects pflip_status == AMDGPU_FLIP_SUBMITTED and assumes the >>> pageflip has been completed/will complete in this vblank and >>> sends out pageflip completion event to userspace and resets >>> pflip_status = AMDGPU_FLIP_NONE. >>> >>> => Flip completion event sent out one vblank too early. >>> >>> This behaviour has been observed during my testing with measurement >>> hardware a couple of time. >>> >>> The commit message says that the extra flip event code was added to >>> dm_dcn_crtc_high_irq() to prevent missing to send out pageflip events >>> in case the pflip irq doesn't fire, because the "DCH HUBP" component >>> is clock gated and doesn't fire pflip irqs in that state. Also that >>> this clock gating may happen if no planes are active. According to >>> Nicholas, the clock gating can also happen if psr is active, and the >>> gating is controlled independently by the hardware, so difficult to >>> detect if and when the completion code in above commit is needed. >>> >>> This patch tries the following solution: It only executes the extra >>> pflip >>> completion code in dm_dcn_crtc_high_irq() iff the hardware reports >>> that there aren't any surface updated pending in the double-buffered >>> surface scanout address registers. Otherwise it leaves pflip completion >>> to the pflip irq handler, for a more race-free experience. >>> >>> This would only guard against the order of events mentioned above. >>> If Step 5 (VSTARTUP trigger) happens before step 4 then this won't help >>> at all, because 1-3 + 5 might happen even without the hw being >>> programmed >>> at all, ie. no surface update pending because none yet programmed >>> into hw. >>> >>> Therefore this patch also changes locking in amdgpu_dm_commit_planes(), >>> so that prepare_flip_isr() and dc_commit_updates_for_stream() are done >>> under event_lock protection within the same critical section. >>> >>> v2: Take Nicholas comments into account, try a different solution. >>> >>> Lightly tested on Polaris (locking) and Raven (the whole DCN stuff). >>> Seems to work without causing obvious new trouble. >> >> Nick, any comments on this? Can we get this committed or do you think >> it needs additional rework? >> >> Thanks, >> >> Alex > > Hi Alex, Mario, > > This might be a little strange, but if we want to get this in as a fix > for regressions caused by the original vblank and user events at > vstartup patch then I'm actually going to give my reviewed by on the > *v1* of this patch (but not this v2): > > Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@xxxxxxx> > > You can feel free to apply that one. > > Reason 1: After having thought about it some more I don't think we > enable anything today that has hubp powered down at the same time we > expect to be waiting for a flip - eg. DMCU powering down HUBP during PSR > entry. Static screen interrupt should happen after that flip finishes I > think. > > The CRTC can still be powered on with zero planes, and I don't think any > userspace explicitly asks for vblank events in this case but it doesn't > hurt to have the check. > > Reason 2: This new patch will need much more thorough testing from side > to fully understand the consequences of locking the entire DC commit > sequence. For just a page flip that sounds fine, but for anything more > than (eg. full updates, modesets, etc) I don't think we want to be > disabling interrupts for potentially many milliseconds. Ah! I was wondering where the attached splat comes from, but I think this explains it: With this patch amdgpu_dm_commit_planes keeps the pcrtc->dev->event_lock spinlock locked while calling dc_commit_updates_for_stream, which ends up calling smu_set_display_count, which tries to lock a mutex. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer
Mar 13 12:44:49 kaveri kernel: [16728.466999][T347601] BUG: scheduling while atomic: Xorg/347601/0x00000002 Mar 13 12:44:49 kaveri kernel: [16728.467004][T347601] Modules linked in: fuse(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) nft_counter(E) xt_addrtype(E) nft_compat(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) rfkill(E) cpufreq_powersave(E) cpufreq_userspace(E) cpufreq_conservative(E) edac_mce_amd(E) kvm_amd(E) binfmt_misc(E) snd_hda_codec_realtek(E) kvm(E) snd_hda_codec_generic(E) irqbypass(E) ppdev(E) amdgpu(E) wmi_bmof(E) ledtrig_audio(E) nls_ascii(E) nls_cp437(E) snd_hda_codec_hdmi(E) vfat(E) gpu_sched(E) fat(E) crct10dif_pclmul(E) snd_hda_intel(E) ttm(E) snd_intel_dspcfg(E) crc32_pclmul(E) snd_hda_codec(E) drm_kms_helper(E) snd_hda_core(E) cec(E) snd_hwdep(E) ghash_clmulni_intel(E) snd_pcm(E) rc_core(E) r8169(E) snd_timer(E) aesni_intel(E) drm(E) efi_pstore(E) sp5100_tco(E) ccp(E) crypto_simd(E) realtek(E) snd(E) i2c_algo_bit(E) watchdog(E) Mar 13 12:44:49 kaveri kernel: [16728.467043][T347601] cryptd(E) mfd_core(E) glue_helper(E) pcspkr(E) efivars(E) sg(E) parport_pc(E) libphy(E) k10temp(E) soundcore(E) rng_core(E) i2c_piix4(E) parport(E) wmi(E) button(E) acpi_cpufreq(E) tcp_bbr(E) sch_fq(E) nct6775(E) hwmon_vid(E) sunrpc(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) dm_mod(E) raid10(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) sd_mod(E) evdev(E) hid_generic(E) usbhid(E) hid(E) xhci_pci(E) ahci(E) libahci(E) xhci_hcd(E) crc32c_intel(E) libata(E) usbcore(E) scsi_mod(E) gpio_amdpt(E) gpio_generic(E) Mar 13 12:44:49 kaveri kernel: [16728.467077][T347601] Preemption disabled at: Mar 13 12:44:49 kaveri kernel: [16728.467080][T347601] [<0000000000000000>] 0x0 Mar 13 12:44:49 kaveri kernel: [16728.467084][T347601] CPU: 7 PID: 347601 Comm: Xorg Tainted: G E 5.5.8+ #28 Mar 13 12:44:49 kaveri kernel: [16728.467086][T347601] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.OR 11/29/2019 Mar 13 12:44:49 kaveri kernel: [16728.467088][T347601] Call Trace: Mar 13 12:44:49 kaveri kernel: [16728.467096][T347601] dump_stack+0x66/0x90 Mar 13 12:44:49 kaveri kernel: [16728.467101][T347601] __schedule_bug.cold+0x8e/0x9b Mar 13 12:44:49 kaveri kernel: [16728.467106][T347601] __schedule+0x63c/0x790 Mar 13 12:44:49 kaveri kernel: [16728.467110][T347601] schedule+0x46/0xf0 Mar 13 12:44:49 kaveri kernel: [16728.467112][T347601] schedule_preempt_disabled+0x14/0x20 Mar 13 12:44:49 kaveri kernel: [16728.467115][T347601] __mutex_lock.isra.0+0x178/0x520 Mar 13 12:44:49 kaveri kernel: [16728.467207][T347601] smu_set_display_count+0x21/0x60 [amdgpu] Mar 13 12:44:49 kaveri kernel: [16728.467291][T347601] pp_nv_set_display_count+0x23/0x40 [amdgpu] Mar 13 12:44:49 kaveri kernel: [16728.467375][T347601] dcn2_update_clocks+0x597/0x650 [amdgpu] Mar 13 12:44:49 kaveri kernel: [16728.467460][T347601] dcn20_prepare_bandwidth+0x32/0x70 [amdgpu] Mar 13 12:44:49 kaveri kernel: [16728.467541][T347601] dc_commit_updates_for_stream+0xee4/0x15d0 [amdgpu] Mar 13 12:44:49 kaveri kernel: [16728.467623][T347601] amdgpu_dm_atomic_commit_tail+0x1210/0x2130 [amdgpu] Mar 13 12:44:49 kaveri kernel: [16728.467632][T347601] ? kfree+0x196/0x2b0 Mar 13 12:44:49 kaveri kernel: [16728.467649][T347601] commit_tail+0x94/0x130 [drm_kms_helper] Mar 13 12:44:49 kaveri kernel: [16728.467658][T347601] drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper] Mar 13 12:44:49 kaveri kernel: [16728.467682][T347601] drm_mode_obj_set_property_ioctl+0x115/0x2d0 [drm] Mar 13 12:44:49 kaveri kernel: [16728.467699][T347601] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm] Mar 13 12:44:49 kaveri kernel: [16728.467711][T347601] drm_ioctl_kernel+0xaa/0xf0 [drm] Mar 13 12:44:49 kaveri kernel: [16728.467724][T347601] drm_ioctl+0x208/0x390 [drm] Mar 13 12:44:49 kaveri kernel: [16728.467737][T347601] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm] Mar 13 12:44:49 kaveri kernel: [16728.467742][T347601] ? preempt_count_sub+0x9b/0xd0 Mar 13 12:44:49 kaveri kernel: [16728.467793][T347601] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] Mar 13 12:44:49 kaveri kernel: [16728.467799][T347601] do_vfs_ioctl+0x45f/0x6d0 Mar 13 12:44:49 kaveri kernel: [16728.467803][T347601] ksys_ioctl+0x5e/0x90 Mar 13 12:44:49 kaveri kernel: [16728.467806][T347601] __x64_sys_ioctl+0x16/0x20 Mar 13 12:44:49 kaveri kernel: [16728.467811][T347601] do_syscall_64+0x5f/0x200 Mar 13 12:44:49 kaveri kernel: [16728.467816][T347601] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Mar 13 12:44:49 kaveri kernel: [16728.467819][T347601] RIP: 0033:0x7f023d792497 Mar 13 12:44:49 kaveri kernel: [16728.467822][T347601] Code: 00 00 90 48 8b 05 f9 79 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 79 0c 00 f7 d8 64 89 01 48 Mar 13 12:44:49 kaveri kernel: [16728.467824][T347601] RSP: 002b:00007ffd773e0348 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Mar 13 12:44:49 kaveri kernel: [16728.467827][T347601] RAX: ffffffffffffffda RBX: 00007ffd773e0380 RCX: 00007f023d792497 Mar 13 12:44:49 kaveri kernel: [16728.467829][T347601] RDX: 00007ffd773e0380 RSI: 00000000c01864ba RDI: 000000000000000d Mar 13 12:44:49 kaveri kernel: [16728.467830][T347601] RBP: 00000000c01864ba R08: 000000000000005c R09: 00000000cccccccc Mar 13 12:44:49 kaveri kernel: [16728.467832][T347601] R10: 000055fbcc8a6bd4 R11: 0000000000000246 R12: 000055fbcb7199b0 Mar 13 12:44:49 kaveri kernel: [16728.467834][T347601] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000fff Mar 13 12:44:49 kaveri kernel: [16728.477723][T347601] ------------[ cut here ]------------ Mar 13 12:44:49 kaveri kernel: [16728.477728][T347601] DEBUG_LOCKS_WARN_ON(val > preempt_count()) Mar 13 12:44:49 kaveri kernel: [16728.477739][T347601] WARNING: CPU: 7 PID: 347601 at kernel/sched/core.c:3816 preempt_count_sub+0x70/0xd0 Mar 13 12:44:49 kaveri kernel: [16728.477743][T347601] Modules linked in: fuse(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) nft_counter(E) xt_addrtype(E) nft_compat(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) rfkill(E) cpufreq_powersave(E) cpufreq_userspace(E) cpufreq_conservative(E) edac_mce_amd(E) kvm_amd(E) binfmt_misc(E) snd_hda_codec_realtek(E) kvm(E) snd_hda_codec_generic(E) irqbypass(E) ppdev(E) amdgpu(E) wmi_bmof(E) ledtrig_audio(E) nls_ascii(E) nls_cp437(E) snd_hda_codec_hdmi(E) vfat(E) gpu_sched(E) fat(E) crct10dif_pclmul(E) snd_hda_intel(E) ttm(E) snd_intel_dspcfg(E) crc32_pclmul(E) snd_hda_codec(E) drm_kms_helper(E) snd_hda_core(E) cec(E) snd_hwdep(E) ghash_clmulni_intel(E) snd_pcm(E) rc_core(E) r8169(E) snd_timer(E) aesni_intel(E) drm(E) efi_pstore(E) sp5100_tco(E) ccp(E) crypto_simd(E) realtek(E) snd(E) i2c_algo_bit(E) watchdog(E) Mar 13 12:44:49 kaveri kernel: [16728.477780][T347601] cryptd(E) mfd_core(E) glue_helper(E) pcspkr(E) efivars(E) sg(E) parport_pc(E) libphy(E) k10temp(E) soundcore(E) rng_core(E) i2c_piix4(E) parport(E) wmi(E) button(E) acpi_cpufreq(E) tcp_bbr(E) sch_fq(E) nct6775(E) hwmon_vid(E) sunrpc(E) efivarfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) dm_mod(E) raid10(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) sd_mod(E) evdev(E) hid_generic(E) usbhid(E) hid(E) xhci_pci(E) ahci(E) libahci(E) xhci_hcd(E) crc32c_intel(E) libata(E) usbcore(E) scsi_mod(E) gpio_amdpt(E) gpio_generic(E) Mar 13 12:44:49 kaveri kernel: [16728.477809][T347601] CPU: 7 PID: 347601 Comm: Xorg Tainted: G W E 5.5.8+ #28 Mar 13 12:44:49 kaveri kernel: [16728.477811][T347601] Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK (MS-7A34), BIOS 1.OR 11/29/2019 Mar 13 12:44:49 kaveri kernel: [16728.477814][T347601] RIP: 0010:preempt_count_sub+0x70/0xd0 Mar 13 12:44:49 kaveri kernel: [16728.477817][T347601] Code: 6c 5b 5d c3 e8 11 c4 37 00 85 c0 74 f4 8b 15 f7 d9 27 01 85 d2 75 ea 48 c7 c6 a5 f9 cb 94 48 c7 c7 e5 b7 ca 94 e8 22 fc fc ff <0f> 0b eb d3 48 8b 6c 24 10 48 89 ef e8 df e4 02 00 85 c0 74 09 31 Mar 13 12:44:49 kaveri kernel: [16728.477819][T347601] RSP: 0018:ffffa393a61d7928 EFLAGS: 00010282 Mar 13 12:44:49 kaveri kernel: [16728.477821][T347601] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 Mar 13 12:44:49 kaveri kernel: [16728.477823][T347601] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000246 Mar 13 12:44:49 kaveri kernel: [16728.477824][T347601] RBP: ffffa393a61d7c38 R08: 0000000000000001 R09: 000000000000002a Mar 13 12:44:49 kaveri kernel: [16728.477826][T347601] R10: ffffffff95465690 R11: 00000000954652d3 R12: ffff8ab13baf2918 Mar 13 12:44:49 kaveri kernel: [16728.477828][T347601] R13: ffff8aae13ea6400 R14: ffff8ab13bae0000 R15: ffff8ab13e14c800 Mar 13 12:44:49 kaveri kernel: [16728.477830][T347601] FS: 00007f023d1cb940(0000) GS:ffff8ab14e9c0000(0000) knlGS:0000000000000000 Mar 13 12:44:49 kaveri kernel: [16728.477832][T347601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 13 12:44:49 kaveri kernel: [16728.477833][T347601] CR2: 00007ffa2900b54c CR3: 00000003a4340000 CR4: 00000000003406e0 Mar 13 12:44:49 kaveri kernel: [16728.477835][T347601] Call Trace: Mar 13 12:44:49 kaveri kernel: [16728.477842][T347601] _raw_spin_unlock_irqrestore+0x20/0x40 Mar 13 12:44:49 kaveri kernel: [16728.477935][T347601] amdgpu_dm_atomic_commit_tail+0x1ee3/0x2130 [amdgpu] Mar 13 12:44:49 kaveri kernel: [16728.477944][T347601] ? kfree+0x196/0x2b0 Mar 13 12:44:49 kaveri kernel: [16728.477961][T347601] commit_tail+0x94/0x130 [drm_kms_helper] Mar 13 12:44:49 kaveri kernel: [16728.477970][T347601] drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper] Mar 13 12:44:49 kaveri kernel: [16728.477990][T347601] drm_mode_obj_set_property_ioctl+0x115/0x2d0 [drm] Mar 13 12:44:49 kaveri kernel: [16728.478008][T347601] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm] Mar 13 12:44:49 kaveri kernel: [16728.478020][T347601] drm_ioctl_kernel+0xaa/0xf0 [drm] Mar 13 12:44:49 kaveri kernel: [16728.478033][T347601] drm_ioctl+0x208/0x390 [drm] Mar 13 12:44:49 kaveri kernel: [16728.478047][T347601] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm] Mar 13 12:44:49 kaveri kernel: [16728.478051][T347601] ? preempt_count_sub+0x9b/0xd0 Mar 13 12:44:49 kaveri kernel: [16728.478102][T347601] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] Mar 13 12:44:49 kaveri kernel: [16728.478108][T347601] do_vfs_ioctl+0x45f/0x6d0 Mar 13 12:44:49 kaveri kernel: [16728.478112][T347601] ksys_ioctl+0x5e/0x90 Mar 13 12:44:49 kaveri kernel: [16728.478115][T347601] __x64_sys_ioctl+0x16/0x20 Mar 13 12:44:49 kaveri kernel: [16728.478119][T347601] do_syscall_64+0x5f/0x200 Mar 13 12:44:49 kaveri kernel: [16728.478122][T347601] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Mar 13 12:44:49 kaveri kernel: [16728.478125][T347601] RIP: 0033:0x7f023d792497 Mar 13 12:44:49 kaveri kernel: [16728.478128][T347601] Code: 00 00 90 48 8b 05 f9 79 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 79 0c 00 f7 d8 64 89 01 48 Mar 13 12:44:49 kaveri kernel: [16728.478130][T347601] RSP: 002b:00007ffd773e0348 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Mar 13 12:44:49 kaveri kernel: [16728.478132][T347601] RAX: ffffffffffffffda RBX: 00007ffd773e0380 RCX: 00007f023d792497 Mar 13 12:44:49 kaveri kernel: [16728.478134][T347601] RDX: 00007ffd773e0380 RSI: 00000000c01864ba RDI: 000000000000000d Mar 13 12:44:49 kaveri kernel: [16728.478136][T347601] RBP: 00000000c01864ba R08: 000000000000005c R09: 00000000cccccccc Mar 13 12:44:49 kaveri kernel: [16728.478137][T347601] R10: 000055fbcc8a6bd4 R11: 0000000000000246 R12: 000055fbcb7199b0 Mar 13 12:44:49 kaveri kernel: [16728.478139][T347601] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000fff Mar 13 12:44:49 kaveri kernel: [16728.478143][T347601] ---[ end trace e1909b5aa94da59e ]---
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx