[PATCH v4 2/2] drm/amdgpu: Free VGA stolen memory as soon as possible.

Andrey.Grodzovsky@xxxxxxx (Andrey Grodzovsky) · Thu, 12 Apr 2018 15:59:20 -0400

On 04/12/2018 11:10 AM, Michel DÃ¤nzer wrote:
> On 2018-04-12 04:58 PM, Andrey Grodzovsky wrote:
>> On 04/12/2018 10:33 AM, Michel DÃ¤nzer wrote:
>>> On 2018-04-12 03:33 PM, Alex Deucher wrote:
>>>> On Thu, Apr 12, 2018 at 7:17 AM, Andrey Grodzovsky
>>>> <Andrey.Grodzovsky at amd.com> wrote:
>>>>> On 04/12/2018 12:32 AM, Alex Deucher wrote:
>>>>>> On Thu, Apr 12, 2018 at 12:08 AM, Andrey Grodzovsky
>>>>>> <andrey.grodzovsky at amd.com> wrote:
>>>>>>> Reserved VRAM is used to avoid overriding pre OS FB.
>>>>>>> Once our display stack takes over we don't need the reserved
>>>>>>> VRAM anymore.
>>>>>>>
>>>>>>> v2:
>>>>>>> Remove comment, we know actually why we need to reserve the stolen
>>>>>>> VRAM.
>>>>>>> Fix return type for amdgpu_ttm_late_init.
>>>>>>> v3:
>>>>>>> Return 0 in amdgpu_bo_late_init, rebase on changes to previous patch
>>>>>>> v4:
>>>>>>> Don't release stolen memory for GMC9 ASICs untill GART corruption
>>>>>>> on S3 resume is resolved.
>>>>>>>
>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>> [...]
>> Not sure what it means ?
> It means I trimmed some quoted text. Would it be clearer if I put
> quotation markers before it?

Got it now.

>
>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>>>>> index 252a6c69..099e3ce5 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>>>>>> @@ -659,6 +659,16 @@ static int gmc_v9_0_late_init(void *handle)
>>>>>>>  Â Â Â Â Â Â Â Â Â  unsigned i;
>>>>>>>  Â Â Â Â Â Â Â Â Â  int r;
>>>>>>>
>>>>>>> +Â Â Â Â Â Â  /*
>>>>>>> +Â Â Â Â Â Â Â  * TODO:
>>>>>>> +Â Â Â Â Â Â Â  * Currently there is a bug where some memory client outside
>>>>>>> +Â Â Â Â Â Â Â  * of the driver writes to first 8M of VRAM on S3 resume,
>>>>>>> +Â Â Â Â Â Â Â  * this overrides GART which by default gets placed in
>>>>>>> first 8M
>>>>>>> and
>>>>>>> +Â Â Â Â Â Â Â  * causes VM_FAULTS once GTT is accessed.
>>>>>>> +Â Â Â Â Â Â Â  * Keep the stolen memory reservation until this solved.
>>>>>>> +Â Â Â Â Â Â Â  */
>>>>>>> +Â Â Â Â Â Â  /* amdgpu_bo_late_init(adev); /
>>>>>>> +
>>>>>> We still need to free this somewhere.Â  I'd suggest calling it in
>>>>>> gmc_v9_0_sw_fini() and add a comment there about moving it when we fix
>>>>>> the issue.
>>>>>>
>>>>>>>  Â Â Â Â Â Â Â Â Â  for(i = 0; i < adev->num_rings; ++i) {
>>>>>>>  Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  struct amdgpu_ring *ring = adev->rings[i];
>>>>>>>  Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  unsigned vmhub = ring->funcs->vmhub;
>>>>>>> @@ -884,10 +894,6 @@ static int gmc_v9_0_sw_init(void *handle)
>>>>>>>  Â Â Â Â Â Â Â Â Â Â  */
>>>>>>>  Â Â Â Â Â Â Â Â Â  adev->gmc.mc_mask = 0xffffffffffffULL; /* 48 bit MC */
>>>>>>>
>>>>>>> -Â Â Â Â Â Â  /*
>>>>>>> -Â Â Â Â Â Â Â  * It needs to reserve 8M stolen memory for vega10
>>>>>>> -Â Â Â Â Â Â Â  * TODO: Figure out how to avoid that...
>>>>>>> -Â Â Â Â Â Â Â  */
>>>>>>>  Â Â Â Â Â Â Â Â Â  adev->gmc.stolen_size = gmc_v9_0_get_vbios_fb_size(adev);
>>>>>> We may also just want to return 8MB or 9MB temporarily in
>>>>>> gmc_v9_0_get_vbios_fb_size until we sort out the root cause of the S3
>>>>>> issue otherwise we're potentially wasting a lot more memory.
>>>>> But what if we have 4k display ? In this case returning 9M probably
>>>>> will not
>>>>> hide the corruptionÂ  we were originally dealing with. I remember in
>>>>> that
>>>>> case pre OS FB size would be 32M.
>>>> I guess it's a trade off, possible garbage monentary during bios to
>>>> driver transition vs. wasting an additional 24 MB of CPU accessible
>>>> vram for the life of the driver.
>>> Can we free the reserved memory after initialization, then reserve it
>>> again on resume?
>> The issue here was that someone overrides the first 8M of VRAM and
>> corrupts the GART table, which causes VM_FAULTS. Until we find who is
>> writing into this area of VRAM and when exactly I think we cannot allow
>> any data to be placed there since it's might get corrupted (even if we
>> avoid placing the GART table there).
> I think it shouldn't be too hard in general to make sure the GART table,
> and any other BOs which stay in VRAM across suspend/resume, don't fall
> in the affected area.

Not sure I understand how easily to search for all this kind of objects 
across all IPs. Also how can we be sure the
effected region is ran over only during resume, we know that GART table 
is ran over during resume but we can't be
sure other areas in that region are not ran over during other times and 
if so it's dangerous to allow allocations there.

Andrey

>
>