Re: [PATCH] drm/amd/display: Fix by adding FPU protection for dcn30_internal_validate_bw

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2022-03-30 09:57, Melissa Wen wrote:
On 03/30, VURDIGERENATARAJ, CHANDAN wrote:
Hi Paul,

Am 29.03.22 um 10:29 schrieb CHANDAN VURDIGERE NATARAJ:

Is it common to spell your name all uppercase? If not, please use Chandan nVurdigere Nataraj.

[WHY]

The [] already emphasize the word, so Why could be used.

Below general protection fault observed when WebGL Aquarium is run for
longer duration. If drm debug logs are enabled and set to 0x1f then
the

In what browser and what version?
The issue was observed on ChromiumOS and Chromium Browser version 100.0.4877.0

issue is observed within 10 minutes of run.

Where you able to reproduce it without drm debug logs?
Yes. It took 34 hours to reproduce without drm debug logs. Using drm debug logs was a faster way to reproduce the issue.

[  100.717056] general protection fault, probably for non-canonical address 0x2d33302d32323032: 0000 [#1] PREEMPT SMP NOPTI
[  100.727921] CPU: 3 PID: 1906 Comm: DrmThread Tainted: G        W         5.15.30 #12 d726c6a2d6ebe5cf9223931cbca6892f916fe18b
[  100.754419] RIP: 0010:CalculateSwathWidth+0x1f7/0x44f
[  100.767109] Code: 00 00 00 f2 42 0f 11 04 f0 48 8b 85 88 00 00 00
f2 42 0f 10 04 f0 48 8b 85 98 00 00 00 f2 42 0f 11 04 f0 48 8b 45 10
0f 57 c0 <f3> 42 0f 2a 04 b0 0f 57 c9 f3 43 0f 2a 0c b4 e8 8c e2 f3 ff
48 8b [  100.781269] RSP: 0018:ffffa9230079eeb0 EFLAGS: 00010246 [
100.812528] RAX: 2d33302d32323032 RBX: 0000000000000500 RCX:
0000000000000000 [  100.819656] RDX: 0000000000000001 RSI:
ffff99deb712c49c RDI: 0000000000000000 [  100.826781] RBP:
ffffa9230079ef50 R08: ffff99deb712460c R09: ffff99deb712462c [
100.833907] R10: ffff99deb7124940 R11: ffff99deb7124d70 R12:
ffff99deb712ae44 [  100.841033] R13: 0000000000000001 R14:
0000000000000000 R15: ffffa9230079f0a0 [  100.848159] FS:  00007af121212640(0000) GS:ffff99deba780000(0000) knlGS:0000000000000000 [  100.856240] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [  100.861980] CR2: 0000209000fe1000 CR3: >000000011b18c000 CR4: 0000000000350ee0 [  100.869106] Call Trace:
[  100.871555]  <TASK>
[  100.873655]  ? asm_sysvec_reschedule_ipi+0x12/0x20
[  100.878449]  CalculateSwathAndDETConfiguration+0x1a3/0x6dd
[  100.883937]
dml31_ModeSupportAndSystemConfigurationFull+0x2ce4/0x76da
[  100.890467]  ? kallsyms_lookup_buildid+0xc8/0x163
[  100.895173]  ? kallsyms_lookup_buildid+0xc8/0x163
[  100.899874]  ? __sprint_symbol+0x80/0x135 [  100.903883]  ?
dm_update_plane_state+0x3f9/0x4d2 [  100.908500]  ?
symbol_string+0xb7/0xde [  100.912250]  ? number+0x145/0x29b [
100.915566]  ? vsnprintf+0x341/0x5ff [  100.919141]  ?
desc_read_finalized_seq+0x39/0x87 [  100.923755]  ?
update_load_avg+0x1b9/0x607 [  100.927849]  ?
compute_mst_dsc_configs_for_state+0x7d/0xd5b
[  100.933416]  ? fetch_pipe_params+0xa4d/0xd0c [  100.937686]  ?
dc_fpu_end+0x3d/0xa8 [  100.941175]  dml_get_voltage_level+0x16b/0x180
[  100.945619]  dcn30_internal_validate_bw+0x10e/0x89b
[  100.950495]  ? dcn31_validate_bandwidth+0x68/0x1fc
[  100.955285]  ? resource_build_scaling_params+0x98b/0xb8c
[  100.960595]  ? dcn31_validate_bandwidth+0x68/0x1fc
[  100.965384]  dcn31_validate_bandwidth+0x9a/0x1fc
[  100.970001]  dc_validate_global_state+0x238/0x295
[  100.974703]  amdgpu_dm_atomic_check+0x9c1/0xbce
[  100.979235]  ? _printk+0x59/0x73
[  100.982467]  drm_atomic_check_only+0x403/0x78b [  100.986912]
drm_mode_atomic_ioctl+0x49b/0x546 [  100.991358]  ?
drm_ioctl+0x1c1/0x3b3 [  100.994936]  ?
drm_atomic_set_property+0x92a/0x92a
[  100.999725]  drm_ioctl_kernel+0xdc/0x149 [  101.003648]
drm_ioctl+0x27f/0x3b3 [  101.007051]  ?
drm_atomic_set_property+0x92a/0x92a
[  101.011842]  amdgpu_drm_ioctl+0x49/0x7d [  101.015679]
__se_sys_ioctl+0x7c/0xb8 [  101.015685]  do_syscall_64+0x5f/0xb8 [
101.015690]  ? __irq_exit_rcu+0x34/0x96

[HOW]
It calles populate_dml_pipes which uses doubles to initialize.

calls

Excuse my ignorance. So using doubles causes a context switch?
If we don’t add FPU protection then context switch can happen. DC_FP_START would in-turn call preempt_disable.

Adding FPU protection avoids context switch and probable loss of vba
context as there is potential contention while drm debug logs are enabled.

Signed-off-by: CHANDAN VURDIGERE NATARAJ
<chandan.vurdigerenataraj@xxxxxxx>

diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
index 826970f2bd0a..f27262417abe 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_resource.c
@@ -1750,7 +1750,9 @@ bool dcn31_validate_bandwidth(struct dc *dc,
BW_VAL_TRACE_COUNT(); + DC_FP_START();
   	out = dcn30_internal_validate_bw(dc, context, pipes, &pipe_cnt,
&vlevel, fast_validate);
+	DC_FP_END();
// Disable fast_validate to set min dcfclk in alculate_wm_and_dlg

Hi Chandan,

This change lgtm, however, you need to rebase this patch with the latest code from amd-staging-drm-next... with that change:

Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@xxxxxxx>

   	if (pipe_cnt == 0)
Acked-by: Melissa Wen <mwen@xxxxxxxxxx>

In fact, I revisited the code and realized the FPU protection is
missing for two other dcn30 functions called by dcn31:
- dcn30_populate_dml_writeback_from_context()
- dcn30_set_mcif_arb_params()

I'll send a patch addressing this shortly.

CC'ing: Siqueira

Best regards,

Melissa


Kind regards,

Paul




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux