[PATCH] drm/amdgpu: fix missed gpu info firmware when cache firmware during S3

ray.huang@xxxxxxx (Huang Rui) · Tue, 6 Jun 2017 22:55:48 +0800

On Tue, Jun 06, 2017 at 10:52:46PM +0800, Huang Rui wrote:
> On Tue, Jun 06, 2017 at 10:45:42PM +0800, Alex Deucher wrote:
> > On Tue, Jun 6, 2017 at 10:03 AM, Alex Deucher <alexdeucher at gmail.com> wrote:
> > > On Tue, Jun 6, 2017 at 7:22 AM, Christian K?nig
> > > <christian.koenig at amd.com> wrote:
> > >>> Yes, I agree with you. That's also my orignal opinion.
> > >>> But we encountered a random buggy when we were calling
> > >>> device_cache_fw_images.
> > >>
> > >> That looks like an upstream bug in device_cache_fw_images.
> > >>
> > >> We should probably open a bug report and ping the maintainer. Most likely we
> > >> are not correctly using the FW interface or trigger a rare bug or something
> > >> like this.
> > >>
> > >>> So then I check these functions and find gpu_info errors. The random buggy
> > >>> cannot be reproduced constantly.But we expected it can pass more than 30
> > >>> cycles
> > >>> of S3 suspend and resume. Any ideas?
> > >>
> > >> I think the real solution is to just stop calling
> > >> amdgpu_device_parse_gpu_info_fw() during resume.
> > >
> > > Right.  we only need to parse the firmware once during startup.
> > 
> > How are hitting this on resume?  amdgpu_device_parse_gpu_info_fw() is
> > called indirectly from amdgpu_device_init() which is only called once
> > at driver load time.
> > 
> 
> Yes, I also noted it. So I am confused with why firmware_class will still
> cache it during suspend.
> 

At that time, we already have released gpu_info firmware data. It seems a
bug of upper layer.

Thanks,
Ray