On Tue, Jun 06, 2017 at 10:52:46PM +0800, Huang Rui wrote: > On Tue, Jun 06, 2017 at 10:45:42PM +0800, Alex Deucher wrote: > > On Tue, Jun 6, 2017 at 10:03 AM, Alex Deucher <alexdeucher at gmail.com> wrote: > > > On Tue, Jun 6, 2017 at 7:22 AM, Christian K?nig > > > <christian.koenig at amd.com> wrote: > > >>> Yes, I agree with you. That's also my orignal opinion. > > >>> But we encountered a random buggy when we were calling > > >>> device_cache_fw_images. > > >> > > >> That looks like an upstream bug in device_cache_fw_images. > > >> > > >> We should probably open a bug report and ping the maintainer. Most likely we > > >> are not correctly using the FW interface or trigger a rare bug or something > > >> like this. > > >> > > >>> So then I check these functions and find gpu_info errors. The random buggy > > >>> cannot be reproduced constantly.But we expected it can pass more than 30 > > >>> cycles > > >>> of S3 suspend and resume. Any ideas? > > >> > > >> I think the real solution is to just stop calling > > >> amdgpu_device_parse_gpu_info_fw() during resume. > > > > > > Right. we only need to parse the firmware once during startup. > > > > How are hitting this on resume? amdgpu_device_parse_gpu_info_fw() is > > called indirectly from amdgpu_device_init() which is only called once > > at driver load time. > > > > Yes, I also noted it. So I am confused with why firmware_class will still > cache it during suspend. > At that time, we already have released gpu_info firmware data. It seems a bug of upper layer. Thanks, Ray