On 2021-03-05 13:23, Holger Hoffstätte wrote:
On 2021-03-05 12:39, Holger Hoffstätte wrote:
Commit 41401ac67791 added FPU wrappers to dcn21_validate_bandwidth(),
which was correct. Unfortunately a nested function alredy contained
DC_FP_START()/DC_FP_END() calls, which results in nested FPU context
enter/exit and complaints by kernel_fpu_begin_mask().
This can be observed e.g. with 5.10.20, which backported 41401ac67791
and now emits the following warning on boot:
WARNING: CPU: 6 PID: 858 at arch/x86/kernel/fpu/core.c:129 kernel_fpu_begin_mask+0xa5/0xc0
Call Trace:
dcn21_calculate_wm+0x47/0xa90 [amdgpu]
dcn21_validate_bandwidth_fp+0x15d/0x2b0 [amdgpu]
dcn21_validate_bandwidth+0x29/0x40 [amdgpu]
dc_validate_global_state+0x3c7/0x4c0 [amdgpu]
The warning is emitted due to the additional DC_FP_START/END calls in
patch_bounding_box(), which is inlined into dcn21_calculate_wm(),
its only caller. Removing the calls brings the code in line with
dcn20 and makes the warning disappear.
Fixes: 41401ac67791 ("drm/amd/display: Add FPU wrappers to dcn21_validate_bandwidth()")
Signed-off-by: Holger Hoffstätte <holger@xxxxxxxxxxxxxxxxxxxxxx>
---
drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
index 072f8c880924..68be73fe2e23 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
@@ -1062,8 +1062,6 @@ static void patch_bounding_box(struct dc *dc, struct _vcs_dpi_soc_bounding_box_s
{
int i;
- DC_FP_START();
-
if (dc->bb_overrides.sr_exit_time_ns) {
for (i = 0; i < WM_SET_COUNT; i++) {
dc->clk_mgr->bw_params->wm_table.entries[i].sr_exit_time_us =
@@ -1088,8 +1086,6 @@ static void patch_bounding_box(struct dc *dc, struct _vcs_dpi_soc_bounding_box_s
dc->bb_overrides.dram_clock_change_latency_ns / 1000.0;
}
}
-
- DC_FP_END();
}
void dcn21_calculate_wm(
Hmm..this is getting confusing since I was just greeted by the following for
no obvious reason (probably when playing a browser video or something):
Mar 5 12:38] ------------[ cut here ]------------
[ +0.000006] WARNING: CPU: 8 PID: 3803 at arch/x86/kernel/fpu/core.c:155 kernel_fpu_end+0x19/0x20
[ +0.000001] Modules linked in: auth_rpcgss nfsv4 dns_resolver lz4 lz4_compress lz4_decompress nfs lockd grace nfs_ssc sunrpc tcp_bbr2 iwlmvm pkcs8_key_parser amdgpu mac80211 lm92 libarc4 snd_hda_codec_realtek wmi_bmof drivetemp iommu_v2 snd_hda_codec_generic gpu_sched ttm i2c_algo_bit btusb btrtl drm_kms_helper snd_hda_codec_hdmi btbcm btintel uvcvideo cec videobuf2_vmalloc videobuf2_memops iwlwifi videobuf2_v4l2 edac_mce_amd snd_hda_intel videobuf2_common crct10dif_pclmul snd_intel_dspcfg crc32_pclmul drm bluetooth crc32c_intel snd_hda_codec videodev ghash_clmulni_intel syscopyarea snd_rn_pci_acp3x snd_hwdep sysfillrect ecdh_generic rapl serio_raw mc ecc snd_hda_core k10temp sysimgblt snd_pci_acp3x fb_sys_fops i2c_piix4 cfg80211 snd_pcm snd_timer r8169 ccp ipmi_devintf ipmi_msghandler realtek thinkpad_acpi ucsi_acpi typec_ucsi snd typec soundcore wmi ledtrig_audio rfkill ac battery video i2c_scmi pinctrl_amd button
[ +0.000036] CPU: 8 PID: 3803 Comm: X Not tainted 5.10.20 #1
[ +0.000001] Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021
[ +0.000001] RIP: 0010:kernel_fpu_end+0x19/0x20
[ +0.000001] Code: ae 47 40 b8 01 00 00 00 c3 0f 0b eb d7 0f 0b eb c9 0f 1f 44 00 00 65 8a 05 dc 42 ff 7e 84 c0 74 09 65 c6 05 d0 42 ff 7e 00 c3 <0f> 0b eb f3 0f 1f 00 0f 1f 44 00 00 8b 15 95 d2 03 02 31 f6 e8 0e
[ +0.000001] RSP: 0018:ffffc900007b78d0 EFLAGS: 00010246
[ +0.000001] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000027d46
[ +0.000000] RDX: 0000000000027d45 RSI: ffffffffa0d6873d RDI: 000000000002ab00
[ +0.000001] RBP: ffff888349ac0000 R08: 0000000000000480 R09: 00000000000003bf
[ +0.000001] R10: ffffc900007b77e8 R11: 0000000000000000 R12: 0000000000000001
[ +0.000000] R13: ffff88810b2e0000 R14: 0000000000000002 R15: 0000000080000000
[ +0.000001] FS: 00007f6f002558c0(0000) GS:ffff8883ff600000(0000) knlGS:0000000000000000
[ +0.000001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000000] CR2: 00007f255134f8d0 CR3: 000000010431a000 CR4: 0000000000350ee0
[ +0.000001] Call Trace:
[ +0.000053] dcn21_validate_bandwidth+0x31/0x40 [amdgpu]
[ +0.000028] dc_commit_updates_for_stream+0x9d9/0x2aa0 [amdgpu]
[ +0.000033] amdgpu_dm_atomic_commit_tail+0x1374/0x2260 [amdgpu]
[ +0.000005] commit_tail+0x8f/0x120 [drm_kms_helper]
[ +0.000003] drm_atomic_helper_commit+0x1d3/0x200 [drm_kms_helper]
[ +0.000005] drm_mode_obj_set_property_ioctl+0x118/0x380 [drm]
[ +0.000004] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[ +0.000003] drm_ioctl_kernel+0x8a/0x120 [drm]
[ +0.000004] drm_ioctl+0x1f1/0x3b0 [drm]
[ +0.000003] ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[ +0.000019] amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ +0.000002] __x64_sys_ioctl+0x152/0x920
[ +0.000002] ? _copy_from_user+0x28/0x60
[ +0.000002] ? restore_altstack+0x19/0xd0
[ +0.000003] do_syscall_64+0x2d/0x40
[ +0.000002] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ +0.000001] RIP: 0033:0x7f6f007549b7
[ +0.000002] Code: 1f 40 00 48 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b1 e8 0c ff ff ff 85 c0 78 b6 5b 4c 89 e0 5d 41 5c c3 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 79 c4 0c 00 f7 d8 64 89 01 48
[ +0.000000] RSP: 002b:00007ffe9c0f4788 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ +0.000001] RAX: ffffffffffffffda RBX: 00007ffe9c0f47c0 RCX: 00007f6f007549b7
[ +0.000000] RDX: 00007ffe9c0f47c0 RSI: 00000000c01864ba RDI: 000000000000000b
[ +0.000001] RBP: 00000000c01864ba R08: 000000000000006d R09: 00000000cccccccc
[ +0.000000] R10: 0000000000000fff R11: 0000000000000246 R12: 000055a9b98d6720
[ +0.000000] R13: 000000000000000b R14: 0000000000000000 R15: 0000000000000003
[ +0.000001] ---[ end trace 9f0368711896f6eb ]---
..which indicates that there is another spurious kernel_fpu_begin()/end() somewhere,
or I'm misreading things.
It's curious that these warnings only appeared after 41401ac67791; apparently this
is more messy than it seems.
Any clues welcome..
Looks like this is a replay of f41ed88cbd ("drm/amdgpu/display: use GFP_ATOMIC in dcn20_validate_bandwidth_internal"), but this time for dcn21..which still uses
GFP_KERNEL. I'll send a patch.
-h
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx