On 13.12.2016 10:48, Christian König wrote: >>>> The attached patch has fixed these crashes for me so far, but it's >>>> very heavy-handed: it collects all page table shadows and the page >>>> directory shadow and adds them all to the reservations for the callers >>>> of amdgpu_vm_update_page_directory. >>> >>> That is most likely just a timing change, cause the shadows should end >>> up in the duplicates list anyway. So the patch shouldn't have any >>> effect. >> >> Okay, so the reason for the remaining crash is still unclear at least >> for me. > > Yeah, that's a really good question. Can you share the call stack of the > problem once more? Pretty sure I found the root cause now. amdgpu_vm_validate_pt_bos relies on the eviction counter to be able to skip the validation of the page tables. However, moving the shadow page tables out from mem_type TT to SYSTEM doesn't count as an eviction (it just unbinds the mapping in the GTT). Clearly, that's a problem. The quick fix is to skip the num_evictions check in amdgpu_vm_validate_pt_bos. That has worked for me so far. The next best thing is to add an unbind counter in addition to the eviction counter that gets incremented whenever a BO is unbound (so it counts a superset of what the eviction counter counts), and then check that instead of the eviction counter. Cheers, Nicolai