Am 22.03.2018 um 08:18 schrieb Daniel Vetter:
On Wed, Mar 21, 2018 at 12:54:20PM +0100, Christian König wrote:
Am 21.03.2018 um 09:28 schrieb Daniel Vetter:
On Tue, Mar 20, 2018 at 06:47:57PM +0100, Christian König wrote:
Am 20.03.2018 um 15:08 schrieb Daniel Vetter:
[SNIP]
For the in-driver reservation path (CS) having a slow-path that grabs a
temporary reference, drops the vram lock and then locks the reservation
normally (using the acquire context used already for the entire CS) is a
bit tricky, but totally feasible. Ttm doesn't do that though.
That is exactly what we do in amdgpu as well, it's just not very efficient
nor reliable to retry getting the right pages for a submission over and over
again.
Out of curiosity, where's that code? I did read the ttm eviction code way
back, and that one definitely didn't do that. Would be interesting to
update my understanding.
That is in amdgpu_cs.c. amdgpu_cs_parser_bos() does a horrible dance with
grabbing, releasing and regrabbing locks in a loop.
Then in amdgpu_cs_submit() we grab an lock preventing page table updates and
check if all pages are still the one we want to have:
amdgpu_mn_lock(p->mn);
if (p->bo_list) {
for (i = p->bo_list->first_userptr;
i < p->bo_list->num_entries; ++i) {
struct amdgpu_bo *bo = p->bo_list->array[i].robj;
if
(amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
amdgpu_mn_unlock(p->mn);
return -ERESTARTSYS;
}
}
}
If anything changed on the page tables we restart the whole IOCTL using
-ERESTARTSYS and try again.
I'm not talking about userptr here, but general bo eviction. Sorry for the
confusion.
The reason I'm dragging all the general bo management into this
discussions is because we do seem to have fairly fundamental difference in
how that's done, with resulting consequences for the locking hierarchy.
And if this invalidate_mapping stuff should work, together with userptr
and everything else, I think we're required to agree on how this is all
supposed to nest, and how exactly we should back off for the other side
that needs to break the locking circle.
That aside, I don't entirely understand why you need to restart so much. I
figured that get_user_pages is ordered correctly against mmu
invalidations, but I get the impression you think that's not the case. How
does that happen?
Correct. I've had the same assumption, but both Jerome as well as our
internal tests proved me wrong on that.
Key take away from that was that you can't take any locks from neither
the MMU notifier nor the shrinker you also take while calling kmalloc
(or simpler speaking get_user_pages()).
Additional to that in the MMU or shrinker callback all different kinds
of locks might be held, so you basically can't assume that you do thinks
like recursive page table walks or call dma_unmap_anything.
Thinking a moment about that it actually seems to make perfect sense.
So it doesn't matter what order you got between the mmap_sem and your
buffer or allocation lock, it will simply be incorrect with other locks
in the system anyway.
The only solution to that problem we have found is the dance we have in
amdgpu now:
1. Inside invalidate_range_start you grab a recursive read side lock.
2. Wait for all GPU operation on that pages to finish.
3. Then mark all pages as dirty and accessed.
4. Inside invalidate_range_end you release your recursive read side lock
again.
Then during command submission you do the following:
1. Take the locks Figure out all the userptr BOs which needs new pages.
2. Drop the locks again call get_user_pages().
3. Install the new page arrays and reiterate with #1
4. As soon as everybody has pages update your page tables and prepare
all command submission.
5. Take the write side of our recursive lock.
6. Check if all pages are still valid, if not restart the whole IOCTL.
7. Submit the job to the hardware.
8. Drop the write side of our recursive lock.
Regards,
Christian.
-Daniel