On 2018-08-16 02:18 PM, Christian König wrote: > Am 16.08.2018 um 18:50 schrieb Felix Kuehling: >> On 2018-08-16 02:43 AM, Christian König wrote: >> [SNIP] >>> I mean it could be that in the worst case we race and stop a KFD >>> process for no good reason. >> Right. For a more practical example, a KFD BO can get evicted just >> before the application decides to unmap it. The preemption happens >> asynchronously, handled by an SDMA job in the GPU scheduler. That job >> will have an amdgpu_sync object with the eviction fence in it. >> >> While that SDMA job is pending or in progress, the application decides >> to unmap the BO. That removes the eviction fence from that BO's >> reservation. But it can't remove the fence from all the sync objects >> that were previously created and are still in flight. So the preemption >> will be triggered, and the fence will eventually signal when the KFD >> preemption is complete. >> >> I don't think that's something we can prevent. The worst case is that a >> preemption happens unnecessarily if an eviction gets triggered just >> before removing the fence. But removing the fence will prevent future >> evictions of the BO from triggering a KFD process preemption. That's the >> best we can do. > > It's true that you can't drop the SDMA job which wants to evict the > BO, but at this time the fence signaling is already underway and not > stoppable anymore. > > Replacing the fence with a new one would just be much more cleaner and > fix quite a bunch of corner cases where the KFD process would be > preempted without good reason. Replacing the fence cleanly probably also involves a preemption, so you don't gain anything. Regards,  Felix > > It's probably quite a bit of more CPU overhead of doing so, but I > think that this would still be the more fail prove option. > > Regards, > Christian. > > >> >> Regards, >>   Felix >> >