Re: Patch "drm/amd/display: release spinlock before committing updates to stream" has been added to the 4.17-stable tree

Andrey Grodzovsky <Andrey.Grodzovsky@xxxxxxx> · Fri, 6 Jul 2018 09:37:44 -0400

On 07/06/2018 02:26 AM, S, Shirish wrote:

On 7/5/2018 11:11 PM, Harry Wentland wrote:
On 2018-07-05 12:09 PM, Andrey Grodzovsky wrote:
On 07/05/2018 12:06 PM, Greg KH wrote:
On Thu, Jul 05, 2018 at 12:05:09PM -0400, Andrey Grodzovsky wrote:
This patch is wrong as noted by MIchel a while ago - quote from 
his review
of the patch.

"Actually, pflip_status should really only be set to 
AMDGPU_FLIP_SUBMITTED
after the flip has been programmed to the hardware, at least as 
far as the
lock holder is concerned. That's why the code was previously 
holding the
lock until after the dc_commit_updates_for_stream call. Otherwise, 
it's at
least theoretically possible that either:

* dm_pflip_high_irq is called before dc_commit_updates_for_stream, 
but sees
flip_status == AMDGPU_FLIP_SUBMITTED and sends the event to userspace
prematurely

* dm_pflip_high_irq is called after dc_commit_updates_for_stream, 
but sees
flip_status != AMDGPU_FLIP_SUBMITTED, so it incorrectly doesn't 
send the
event to userspace "

It shouldn't go in.
Is there a fix for this in Linus's tree for these problems?
If not, why not?  If so, what is that git commit id?
AFAIK there is still no fix for the original issue which this patch 
was intended to fix.
Andrey,
Ideally  sleep's should not be coded within spinlock block, in this 
case its coded in dc_commit_updates_for_stream() at the below 
occurrences:
1. In drivers/gpu/drm/amd/amdgpu/atom.c => 
amdgpu_atom_execute_table_locked() => kzalloc
2. mutex callbacks in drivers/gpu/drm/amd/powerplay/amd_powerplay.c & 
drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c.
3. In drivers/gpu/drm/amd/display/dc/core/dc.c => dc_create_state() => 
kzalloc

In my case,  when rendering on multiple planes,  it led to system 
instabilities and with this patch applied we no more see them.
(analysis further below in this mail).

Also when posted on amd-gfx with the corresponding RB's there was no 
objection to this patch not sure if nobody saw this patch getting merged.
Harry, can you please comment on this ?

That can be better answered by Shirish.

In my opinion the issue described by Michel is a lesser evil than 
attempting to allocate memory inside a spinlocked area.
Yes Harry. I agree, hence i pushed this patch as we have a code that 
attempts to allocate memory and furthermore calls mutex's.
Shirish, we need to understand why dc_commit_updates_for_stream() is 
trying to do a UPDATE_TYPE_FULL in your scenario. This should never 
really happen through the amdgpu_dm_do_flip code path and we need to 
understand why.
This happens in amdgpu_dm_do_flip() because of the hard-coded number 
of surfaces it call dc_commit_updates_for_stream() with,
        dc_commit_updates_for_stream(adev->dm.dc,
                                             surface_updates,
                                             1, <<<<<<<<<
acrtc_state->stream,
                                             NULL,
&surface_updates->surface,
                                             state);

which is 1, where as in case of rendering on multiple planes the 
stream has 2 surfaces attached and hence the below  condition fails in 
very first check of check_update_surfaces_for_stream()
"if ( ... || stream_status->plane_count != surface_count)"

  If we never do UPDATE_TYPE_FULL here we should not need to allocate 
memory and should be find with leaving the spinlock around 
dc_commit_updates_for_stream().
As i understand we may end-up trying to do a UPDATE_TYPE_FULL due to 
failure in any of the several checks in 
dc_check_update_surfaces_for_stream().
We cannot rule out the possibility of a code path being executed as 
long as it exists, ideally we should code another callback in 
amdgpu_dm_flip() that presumes the update type is always 
UPDATE_TYPE_FAST.

Till we fix this hard-coding i recommend that we should have this 
patch, am open to avoiding it if it breaks existing working systems or 
offends any builds.

Thanks.
Regards,
Shirish S

I believe that at least then we should put a BIG comment in the code 
explaining this change both to avoid future people looking into it 
trying to revert it
believing it's just a plain bug and for possible proper revert when the 
underlying issue with sleeping in atomic context is resolved.

Andrey

Harry

Andrey

thanks,

greg k-h