Re: amdgpu freezes kernel after kernel 5.7.6 changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I narrowed it down even more. When reverting 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b5232e2ee8df85891514c73472cac09921e5d51d everything is stable. I have an uptime of 4+ hours while actively
trying to reproduce the bug (gaming).

Here is a bug report: 
https://gitlab.freedesktop.org/drm/amd/-/issues/1191

Added Nicholas Kazlauskas, author of the mentioned commit.

Am Samstag, den 27.06.2020, 18:03 +0200 schrieb roland.rucky@xxxxxxxxx:
> Not sure if I am contacting the correct person,
> 
> Since I updated to kernel 5.7.6, my system started to freeze
> randomly.
> After a couple of freezes, I noticed, that they always happen when
> playing games, or during videoplayback in e.g. firefox.
> 
> I reverted to the previous kernel 5.7.5, and all issues are gone.
> Next
> I started to revert and test single commits between the two kernel
> versions, which affect amdgpu. If I revert the changes listed below,
> the kernel does not freeze any more.
> 
> Sadly I can`t get any crash reports / logs. Even the magic sysrq key
> does not work, when the system is frozen.
> 
> I will also attach a patch, which includes all reverted commits.
> 
> 
> List of changes I reverted:
> -------------------------------------------------------------------
> ----
> 
> commit 6674508ba1a2ea6caca5de2bcb25bc00a050fd0a
> Author: Harry Wentland <harry.wentland@xxxxxxx>
> Date:   Thu May 28 09:44:44 2020 -0400
> 
>     Revert "drm/amd/display: disable dcn20 abm feature for bring up"
> 
>     commit 14ed1c908a7a623cc0cbf0203f8201d1b7d31d16 upstream.
> 
>     This reverts commit 96cb7cf13d8530099c256c053648ad576588c387.
> 
>     This change was used for DCN2 bringup and is no longer desired.
>     In fact it breaks backlight on DCN2 systems.
> 
>     Cc: Alexander Monakov <amonakov@xxxxxxxxx>
>     Cc: Hersen Wu <hersenxs.wu@xxxxxxx>
>     Cc: Anthony Koo <Anthony.Koo@xxxxxxx>
>     Cc: Michael Chiu <Michael.Chiu@xxxxxxx>
>     Signed-off-by: Harry Wentland <harry.wentland@xxxxxxx>
>     Acked-by: Alex Deucher <alexander.deucher@xxxxxxx>
>     Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@xxxxxxx>
>     Reported-and-tested-by: Alexander Monakov <amonakov@xxxxxxxxx>
>     Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
>     Cc: stable@xxxxxxxxxxxxxxx
>     Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 7fc15b82fe48..f9f02e08054b 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -1334,7 +1334,7 @@ static int dm_late_init(void *handle)
>  	unsigned int linear_lut[16];
>  	int i;
>  	struct dmcu *dmcu = adev->dm.dc->res_pool->dmcu;
> -	bool ret = false;
> +	bool ret;
> 
>  	for (i = 0; i < 16; i++)
>  		linear_lut[i] = 0xFFFF * i / 15;
> @@ -1350,13 +1350,10 @@ static int dm_late_init(void *handle)
>  	 */
>  	params.min_abm_backlight = 0x28F;
> 
> -	/* todo will enable for navi10 */
> -	if (adev->asic_type <= CHIP_RAVEN) {
> -		ret = dmcu_load_iram(dmcu, params);
> +	ret = dmcu_load_iram(dmcu, params);
> 
> -		if (!ret)
> -			return -EINVAL;
> -	}
> +	if (!ret)
> +		return -EINVAL;
> 
>  	return detect_mst_link_for_all_connectors(adev->ddev);
>  }
> 
> commit fba8f9ef7e1405ee6f422beb874791e8a5eb489c
> Author: Alex Deucher <alexander.deucher@xxxxxxx>
> Date:   Tue Jun 2 17:22:48 2020 -0400
> 
>     drm/amdgpu/display: use blanked rather than plane state for sync
> groups
> 
>     commit b7f839d292948142eaab77cedd031aad0bfec872 upstream.
> 
>     We may end up with no planes set yet, depending on the ordering,
> but we
>     should have the proper blanking state which is either handled by
> either
>     DPG or TG depending on the hardware generation.  Check both to
> determine
>     the proper blanked state.
> 
>     Bug: https://gitlab.freedesktop.org/drm/amd/issues/781
>     Fixes: 5fc0cbfad45648 ("drm/amd/display: determine if a pipe is
> synced by plane state")
>     Cc: nicholas.kazlauskas@xxxxxxx
>     Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@xxxxxxx>
>     Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
>     Cc: stable@xxxxxxxxxxxxxxx
>     Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
> b/drivers/gpu/drm/amd/display/dc/core/dc.c
> index 4a619328101c..4acaf4be8a81 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
> @@ -1011,9 +1011,17 @@ static void program_timing_sync(
>  			}
>  		}
> 
> -		/* set first pipe with plane as master */
> +		/* set first unblanked pipe as master */
>  		for (j = 0; j < group_size; j++) {
> -			if (pipe_set[j]->plane_state) {
> +			bool is_blanked;
> +
> +			if (pipe_set[j]->stream_res.opp->funcs-
> > dpg_is_blanked)
> +				is_blanked =
> +					pipe_set[j]->stream_res.opp-
> > funcs->dpg_is_blanked(pipe_set[j]->stream_res.opp);
> +			else
> +				is_blanked =
> +					pipe_set[j]->stream_res.tg-
> > funcs->is_blanked(pipe_set[j]->stream_res.tg);
> +			if (!is_blanked) {
>  				if (j == 0)
>  					break;
> 
> @@ -1034,9 +1042,17 @@ static void program_timing_sync(
>  				status->timing_sync_info.master =
> false;
> 
>  		}
> -		/* remove any other pipes with plane as they have
> already been synced */
> +		/* remove any other unblanked pipes as they have
> already been synced */
>  		for (j = j + 1; j < group_size; j++) {
> -			if (pipe_set[j]->plane_state) {
> +			bool is_blanked;
> +
> +			if (pipe_set[j]->stream_res.opp->funcs-
> > dpg_is_blanked)
> +				is_blanked =
> +					pipe_set[j]->stream_res.opp-
> > funcs->dpg_is_blanked(pipe_set[j]->stream_res.opp);
> +			else
> +				is_blanked =
> +					pipe_set[j]->stream_res.tg-
> > funcs->is_blanked(pipe_set[j]->stream_res.tg);
> +			if (!is_blanked) {
>  				group_size--;
>  				pipe_set[j] = pipe_set[group_size];
>  				j--;
> 
> commit b5232e2ee8df85891514c73472cac09921e5d51d
> Author: Nicholas Kazlauskas <nicholas.kazlauskas@xxxxxxx>
> Date:   Tue Jun 2 20:42:33 2020 -0400
> 
>     drm/amd/display: Revalidate bandwidth before commiting DC updates
> 
>     [ Upstream commit a24eaa5c51255b344d5a321f1eeb3205f2775498 ]
> 
>     [Why]
>     Whenever we switch between tiled formats without also switching
> pixel
>     formats or doing anything else that recreates the DC plane state
> we
>     can run into underflow or hangs since we're not updating the
>     DML parameters before committing to the hardware.
> 
>     [How]
>     If the update type is FULL then call validate_bandwidth again to
> update
>     the DML parmeters before committing the state.
> 
>     This is basically just a workaround and protective measure
> against
>     update types being added DC where we could run into this issue in
>     the future.
> 
>     We can only fully validate the state in advance before applying
> it
> to
>     the hardware if we recreate all the plane and stream states since
>     we can't modify what's currently in use.
> 
>     The next step is to update DM to ensure that we're creating the
> plane
>     and stream states for whatever could potentially be a full update
> in
>     DC to pre-emptively recreate the state for DC global validation.
> 
>     The workaround can stay until this has been fixed in DM.
> 
>     Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@xxxxxxx>
>     Reviewed-by: Hersen Wu <hersenxs.wu@xxxxxxx>
>     Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
>     Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
> b/drivers/gpu/drm/amd/display/dc/core/dc.c
> index 47431ca6986d..4a619328101c 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
> @@ -2517,6 +2517,12 @@ void dc_commit_updates_for_stream(struct dc
> *dc,
> 
>  	copy_stream_update_to_stream(dc, context, stream,
> stream_update);
> 
> +	if (!dc->res_pool->funcs->validate_bandwidth(dc, context,
> false)) {
> +		DC_ERROR("Mode validation failed for stream
> update!\n");
> +		dc_release_state(context);
> +		return;
> +	}
> +
>  	commit_planes_for_stream(
>  				dc,
>  				srf_updates,
> 
> Details:
> 
> * Kernel: 5.7.6
> * GPU: radeon 5700XT
> * CPU: ryzen 3800X
> * running on swaywm(wayland)

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux