On 06/12/17 09:22, Guillaume Tucker wrote: > On 05/12/17 18:32, Ben Skeggs wrote: >> On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter <jonathanh@xxxxxxxxxx> wrote: >> >>> >>> On 04/12/17 18:37, Guillaume Tucker wrote: >>>> If the firmware fails to load then ->fini() will be called before the >>>> device has been initialised, causing the kernel to hang while trying >>>> to write to a register. Add a test in ->fini() to avoid this issue. >>>> >>>> This fixes a kernel hang on tegra124. >>>> >>>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown") >>>> Signed-off-by: Guillaume Tucker <guillaume.tucker@xxxxxxxxxxxxx> >>>> CC: Ben Skeggs <bskeggs@xxxxxxxxxx> >>>> --- >>>> drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++-- >>>> 1 file changed, 5 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >>> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >>>> index a3ba7f50198b..95e2aba64aad 100644 >>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >>>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base) >>>> } >>>> >>>> void >>>> -gf100_bar_bar1_fini(struct nvkm_bar *bar) >>>> +gf100_bar_bar1_fini(struct nvkm_bar *base) >>>> { >>>> - nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000); >>>> + struct nvkm_device *device = base->subdev.device; >>>> + >>>> + if (base->subdev.oneinit) >>>> + nvkm_mask(device, 0x001704, 0x80000000, 0x00000000); >>>> } >>>> >>>> void >>> >>> I have tested this and it works for me. Thanks for fixing this! Would be >>> good to get Ben's ACK, but you can have my ... >>> >> I'd love to get a good explanation as to why it hangs without this >> change, >> as, on the surface, it's not immediately obvious as to why it's hanging. > > To be fair I'm not entirely sure either why this causes a hang, I > haven't read the TRM... The iomem has been mapped at this point, > so accessing the register should work. One clue is when you look > at _bar1_init(), the 0x1704 register is initialised with > some (device instance?) memory address. So it's possible that > the hardware does something special when you set this to 0 as in > _bar1_fini(), which may fail in particular if it was previously > not initialised with a valid address. > > This is merely guesswork, would be interested to find out the > real explanation though. OK, well that's no good. It's a good pointer, but we need to make sure we understand the root of this hang. I will see if I have sometime to dig into this further, maybe next week. Cheers Jon -- nvpublic -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html