On Tue, Dec 12, 2017 at 4:18 AM, Liu, Monk <Monk.Liu at amd.com> wrote: > NAK, you change break SRIOV logic: > > Without lockup_timeout set, this gpu_recover() won't get called at all , unless your IB triggered invalid instruct and that IRQ invoked > Amdgpu_gpu_recover(), by this cause you should disable the logic that in that IRQ instead of change gpu_recover() itself because > For SRIOV we need gpu_recover() even lockup_timeout is zero The default value of 0 indicates that GPU reset isn't ready to be enabled by default. That's what it means. Once the GPU reset works, the default should be non-zero (e.g. 10000) and amdgpu.lockup_timeout=0 should be used to disable all GPU resets in order to be able do scandumps and debug GPU hangs. Marek