Re: [PATCH 05/18] drm/amd/display: Use mdelay to avoid crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 15.12.22 um 00:33 schrieb Alex Hung:


On 2022-12-14 16:06, Alex Deucher wrote:
On Wed, Dec 14, 2022 at 5:56 PM Alex Hung <alex.hung@xxxxxxx> wrote:



On 2022-12-14 15:35, Alex Deucher wrote:
On Wed, Dec 14, 2022 at 5:25 PM Alex Hung <alex.hung@xxxxxxx> wrote:



On 2022-12-14 14:54, Alex Deucher wrote:
On Wed, Dec 14, 2022 at 4:50 PM Alex Hung <alex.hung@xxxxxxx> wrote:



On 2022-12-14 13:48, Alex Deucher wrote:
On Wed, Dec 14, 2022 at 3:22 PM Aurabindo Pillai
<aurabindo.pillai@xxxxxxx> wrote:

From: Alex Hung <alex.hung@xxxxxxx>

[Why]
When running IGT kms_bw test with DP monitor, some systems crash from
msleep no matter how long or short the time is.

[How]
To replace msleep with mdelay.

Can you provide a bit more info about the crash?  A lot of platforms
don't support delay larger than 2-4ms so this change will generate
errors on ARM and possibly other platforms.

Alex

The msleep was introduced in eec3303de3378 for non-compliant display port monitors but IGT's kms_bw test can cause a recent h/w to hang at
msleep(60) when calling "igt_remove_fb" in IGT
(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Figt-gpu-tools%2F-%2Fblob%2Fmaster%2Ftests%2Fkms_bw.c%23L197&amp;data=05%7C01%7Calex.hung%40amd.com%7C81664450189542a7bbc408dade27d0e9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638066559844526853%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=M%2BF4H2qddXItPoUZRVyhlc5N8UF1%2FHIh8PpfT%2BCmdZ4%3D&amp;reserved=0)

It is possible to workaround this by reversing order of
igt_remove_fb(&buffer[i]), as the following example:

      igt_create_color_fb with the order buffer[0], buffer[1], buffer[2]

Hangs:
      igt_remove_fb with the order buffer[0], buffer[1], buffer[2]

No hangs:
      igt_remove_fb with the reversed order buffer[2], buffer[1], buffer[0]

However, IGT simply exposes the problem and it makes more sense to stop
the hang from occurring.

I also tried to remove the msleep completely and it also work, but I
didn't want to break the fix for the original problematic hardware
configuration.

Why does sleep vs delay make a difference?  Is there some race that we
are not locking against?

Alex


That was my original thought but I did not find any previously. I will
investigate it again.

If mdelay(>4) isn't usable on other platforms, is it an option to use
mdelay on x86_64 only and keep msleep on other platforms or just remove
the msleep for other platforms, something like

-                       msleep(60);
+#ifdef CONFIG_X86_64
+                       mdelay(60);
+#endif

That's pretty ugly.  I'd rather try and resolve the root cause.  How
important is the IGT test?  What does it do?  Is the test itself
correct?


Agreed, and I didn't want to add conditions around the mdelay for the
same reason. I will assume this is not an option now.

As in the previous comment, IGT can be modified to avoid the crash by
reversing the order fb is removed - though I suspect I will receive
questions why this is not fixed in kernel.

I wanted to fix this in kernel because nothing stops other user-space
applications to use the same way to crash kernel, so fixing IGT is the
second option.

Apparently causing problems on other platforms isn't an option at all so
I will try to figure out an non-mdelay solution, and then maybe an IGT
solution instead.


What hangs?  The test or the kernel or the hardware?

The system becomes completely unresponsive - no keyboard, mouse nor remote accesses.

I agree with Alex that changing this is extremely questionable and not justified at all.

My educated guess is that by using mdelay() instead of msleep() we keep the CPU core busy and preventing something from happening at the same time as something else.

This clearly points to missing locking or similar to protect concurrent execution of things.

Christian.



Alex


Alex







Acked-by: Aurabindo Pillai <aurabindo.pillai@xxxxxxx>
Signed-off-by: Alex Hung <alex.hung@xxxxxxx>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@xxxxxxx>
---
drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
index 913a1fe6b3da..e6251ccadb70 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -1215,7 +1215,7 @@ void dce110_blank_stream(struct pipe_ctx *pipe_ctx)                              * After output is idle pattern some sinks need time to recognize the stream                              * has changed or they enter protection state and hang.
                             */
-                       msleep(60);
+                       mdelay(60);
                    } else if (pipe_ctx->stream->signal == SIGNAL_TYPE_EDP) {                             if (!link->dc->config.edp_no_power_sequencing) {
                                    /*
--
2.39.0





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux