[PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3

alexdeucher@xxxxxxxxx (Alex Deucher) · Tue, 6 Jun 2017 10:52:32 -0400




On Tue, Jun 6, 2017 at 10:52 AM, Huang Rui <ray.huang at amd.com> wrote:
> On Tue, Jun 06, 2017 at 10:45:42PM +0800, Alex Deucher wrote:
>> On Tue, Jun 6, 2017 at 10:03 AM, Alex Deucher <alexdeucher at gmail.com> wrote:
>> > On Tue, Jun 6, 2017 at 7:22 AM, Christian KÃ¶nig
>> > <christian.koenig at amd.com> wrote:
>> >>> Yes, I agree with you. That's also my orignal opinion.
>> >>> But we encountered a random buggy when we were calling
>> >>> device_cache_fw_images.
>> >>
>> >> That looks like an upstream bug in device_cache_fw_images.
>> >>
>> >> We should probably open a bug report and ping the maintainer. Most likely we
>> >> are not correctly using the FW interface or trigger a rare bug or something
>> >> like this.
>> >>
>> >>> So then I check these functions and find gpu_info errors. The random buggy
>> >>> cannot be reproduced constantly.But we expected it can pass more than 30
>> >>> cycles
>> >>> of S3 suspend and resume. Any ideas?
>> >>
>> >> I think the real solution is to just stop calling
>> >> amdgpu_device_parse_gpu_info_fw() during resume.
>> >
>> > Right.  we only need to parse the firmware once during startup.
>>
>> How are hitting this on resume?  amdgpu_device_parse_gpu_info_fw() is
>> called indirectly from amdgpu_device_init() which is only called once
>> at driver load time.
>>
>
> Yes, I also noted it. So I am confused with why firmware_class will still
> cache it during suspend.

I guess request_firmware expects the driver to keep the firmware
around until the driver is unloaded.  Just one comment about the
patch:

@@ -1272,6 +1272,9 @@ struct amdgpu_firmware {
        const struct amdgpu_psp_funcs *funcs;
        struct amdgpu_bo *rbuf;
        struct mutex mutex;
+
+       /* gpu info firmware data pointer */
+       const struct firmware *fw;
 };

Call this gpu_info_fw rather than just fw.  With that, the patch is:
Reviewed-by: Alex Deucher <alexander.deucher at amd.com>

Alex