Re: Patch "drm/amd/display: release spinlock before committing updates to stream" has been added to the 4.17-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 07/06/2018 02:26 AM, S, Shirish wrote:


On 7/5/2018 11:11 PM, Harry Wentland wrote:
On 2018-07-05 12:09 PM, Andrey Grodzovsky wrote:
On 07/05/2018 12:06 PM, Greg KH wrote:
On Thu, Jul 05, 2018 at 12:05:09PM -0400, Andrey Grodzovsky wrote:
This patch is wrong as noted by MIchel a while ago - quote from his review
of the patch.

"Actually, pflip_status should really only be set to AMDGPU_FLIP_SUBMITTED after the flip has been programmed to the hardware, at least as far as the lock holder is concerned. That's why the code was previously holding the lock until after the dc_commit_updates_for_stream call. Otherwise, it's at
least theoretically possible that either:

* dm_pflip_high_irq is called before dc_commit_updates_for_stream, but sees
flip_status == AMDGPU_FLIP_SUBMITTED and sends the event to userspace
prematurely

* dm_pflip_high_irq is called after dc_commit_updates_for_stream, but sees flip_status != AMDGPU_FLIP_SUBMITTED, so it incorrectly doesn't send the
event to userspace "

It shouldn't go in.
Is there a fix for this in Linus's tree for these problems?
If not, why not?  If so, what is that git commit id?
AFAIK there is still no fix for the original issue which this patch was intended to fix.
Andrey,
Ideally  sleep's should not be coded within spinlock block, in this case its coded in dc_commit_updates_for_stream() at the below occurrences: 1. In drivers/gpu/drm/amd/amdgpu/atom.c => amdgpu_atom_execute_table_locked() => kzalloc 2. mutex callbacks in drivers/gpu/drm/amd/powerplay/amd_powerplay.c & drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c. 3. In drivers/gpu/drm/amd/display/dc/core/dc.c => dc_create_state() => kzalloc

In my case,  when rendering on multiple planes,  it led to system instabilities and with this patch applied we no more see them.
(analysis further below in this mail).

Also when posted on amd-gfx with the corresponding RB's there was no objection to this patch not sure if nobody saw this patch getting merged.
Harry, can you please comment on this ?

That can be better answered by Shirish.

In my opinion the issue described by Michel is a lesser evil than attempting to allocate memory inside a spinlocked area.
Yes Harry. I agree, hence i pushed this patch as we have a code that attempts to allocate memory and furthermore calls mutex's.
Shirish, we need to understand why dc_commit_updates_for_stream() is trying to do a UPDATE_TYPE_FULL in your scenario. This should never really happen through the amdgpu_dm_do_flip code path and we need to understand why.
This happens in amdgpu_dm_do_flip() because of the hard-coded number of surfaces it call dc_commit_updates_for_stream() with,
        dc_commit_updates_for_stream(adev->dm.dc,
                                             surface_updates,
                                             1, <<<<<<<<<
acrtc_state->stream,
                                             NULL,
&surface_updates->surface,
                                             state);

which is 1, where as in case of rendering on multiple planes the stream has 2 surfaces attached and hence the below  condition fails in very first check of check_update_surfaces_for_stream()
"if ( ... || stream_status->plane_count != surface_count)"

  If we never do UPDATE_TYPE_FULL here we should not need to allocate memory and should be find with leaving the spinlock around dc_commit_updates_for_stream().
As i understand we may end-up trying to do a UPDATE_TYPE_FULL due to failure in any of the several checks in dc_check_update_surfaces_for_stream(). We cannot rule out the possibility of a code path being executed as long as it exists, ideally we should code another callback in amdgpu_dm_flip() that presumes the update type is always UPDATE_TYPE_FAST.

Till we fix this hard-coding i recommend that we should have this patch, am open to avoiding it if it breaks existing working systems or offends any builds.

Thanks.
Regards,
Shirish S

I believe that at least then we should put a BIG comment in the code explaining this change both to avoid future people looking into it trying to revert it believing it's just a plain bug and for possible proper revert when the underlying issue with sleeping in atomic context is resolved.

Andrey

Harry

Andrey

thanks,

greg k-h





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux