On 2019-07-09 8:58 a.m., Zhou, David(ChunMing) wrote: > I've raised it up when Christian make page fault, at that patch, > amdgpu_job_submit_direct uses exclusive page fault ring for that. > > But if you use amdgpu_job_submit_direct for gerneral rings ocuppied by > scheduler, I guess varias bugs will happen. The problem is, even the paging ring is used by the scheduler. There are several places where buffer operations are submitted to the paging ring through the scheduler. That makes any use of the paging ring through direct submission problematic. Even ignoring the scheduler, if it's possible that multiple threads submit to the paging ring, we'll need locking to ensure that the contents of the ring remain consistent. IIRC, the rings used to have locking before we had a GPU scheduler. For comparison, see radeon_ring.c, which still has locking. With the GPU scheduler, the rings became single-producer queues that no longer needed locking. But with direct submission that is no longer true. I think a good place to do that locking now would be in amdgpu_ib_schedule. Regards, Felix > > -David > > 在 2019/7/9 12:53, Kuehling, Felix 写道: >> I'm seeing some weird intermittent bugs (vm faults, hangs, etc) when >> trying to use amdgpu_job_submit_direct. I'm wondering if there is a >> possibility of a race condition, when a submit_direct and a GPU >> scheduler thread try to submit to the same ring at the same time. I >> didn't see any locking to allow multiple threads safely submitting to >> the same ring. >> >> Am I missing something? >> >> Thanks, >> Felix >> _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx