[PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3

ray.huang@xxxxxxx (Huang Rui) · Tue, 6 Jun 2017 22:52:47 +0800

On Tue, Jun 06, 2017 at 10:45:42PM +0800, Alex Deucher wrote:
> On Tue, Jun 6, 2017 at 10:03 AM, Alex Deucher <alexdeucher at gmail.com> wrote:
> > On Tue, Jun 6, 2017 at 7:22 AM, Christian KÃ¶nig
> > <christian.koenig at amd.com> wrote:
> >>> Yes, I agree with you. That's also my orignal opinion.
> >>> But we encountered a random buggy when we were calling
> >>> device_cache_fw_images.
> >>
> >> That looks like an upstream bug in device_cache_fw_images.
> >>
> >> We should probably open a bug report and ping the maintainer. Most likely we
> >> are not correctly using the FW interface or trigger a rare bug or something
> >> like this.
> >>
> >>> So then I check these functions and find gpu_info errors. The random buggy
> >>> cannot be reproduced constantly.But we expected it can pass more than 30
> >>> cycles
> >>> of S3 suspend and resume. Any ideas?
> >>
> >> I think the real solution is to just stop calling
> >> amdgpu_device_parse_gpu_info_fw() during resume.
> >
> > Right.  we only need to parse the firmware once during startup.
> 
> How are hitting this on resume?  amdgpu_device_parse_gpu_info_fw() is
> called indirectly from amdgpu_device_init() which is only called once
> at driver load time.
> 

Yes, I also noted it. So I am confused with why firmware_class will still
cache it during suspend.

Thanks,
Ray