For B path, we take mm->mmap_sem,
then call hmm_vma_get_pfns() or get_user_pages(). This is obvious.
For A path, mmu notifier
mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_end()
is called in many places, and the calling path is quit complicated
inside mm, it's not obvious. I checked many of the them, for
example:
do_munmap()
down_write(&mm->mmap_sem)
arch_unmap()
mpx_notify_unmap()...
zap_bt_entries_mapping()
zap_page_range()
up_write(&mm->mmap_sem)
void zap_page_range(struct vm_area_struct *vma, unsigned long
start,
unsigned long size)
{
struct mm_struct *mm = vma->vm_mm;
struct mmu_gather tlb;
unsigned long end = start + size;
lru_add_drain();
tlb_gather_mmu(&tlb, mm, start, end);
update_hiwater_rss(mm);
mmu_notifier_invalidate_range_start(mm, start, end);
for ( ; vma && vma->vm_start < end; vma =
vma->vm_next)
unmap_single_vma(&tlb, vma, start, end, NULL);
mmu_notifier_invalidate_range_end(mm, start, end);
tlb_finish_mmu(&tlb, start, end);
}
So AFAIK it's okay without invalidate_range_end() callback.
Regards,
Philip
On 2018-09-28 01:25 AM, Koenig, Christian wrote:
No, that is incorrect as well :)
The mmap_sem isn't necessary taken during page
table updates.
What you could do is replace get_user_pages()
directly with HMM. If I'm not completely mistaken that should
work as expected.
Christian.
I was trying to understand the way
how HMM handle this concurrent issue and how we handle it in
amdgpu_ttm_tt_userptr_needs_pages() and
amdgpu_ttm_tt_affect_userptr(). HMM uses range->valid flag,
we use gtt->mmu_invalidations and gtt->last_set_pages.
Both use the same lock plus flag idea actually.
Thanks for the information, now I understand fence
ttm_eu_fence_buffer_objects() put to BOs will block CPU page
table update. This is another side of this concurrent issue I
didn't know.
I had same worry that it has issue without
invalidate_range_end() callback as the calling sequence Felix
lists. Now I think it's fine after taking a look again today
because of mm->mmap_sem usage, this is my understanding:
A path:
down_write(&mm->mmap_sem);
mmu_notifier_invalidate_range_start()
take_lock()
release_lock()
CPU page table update
mmu_notifier_invalidate_range_end()
up_write(&mm->mmap_sem);
B path:
again:
down_read(&mm->mmap_sem);
hmm_vma_get_pfns()
up_read(&mm->mmap_sem);
....
....
take_lock()
if (!hmm_vma_range_done()) {
release_lock()
goto again
}
submit command job...
release_lock()
If you agree, I will submit patch v5 with some minor changes,
and submit another patch to replace get_user_page() with HMM.
Regards,
Philip
On 2018-09-27 11:36 AM, Christian König wrote:
Yeah, I've read that as well.
My best guess is that we just need to add a call to
hmm_vma_range_done() after taking the lock and also replace
get_user_pages() with hmm_vma_get_pfns().
But I'm still not 100% sure how all of that is supposed to
work together.
Regards,
Christian.
Am 27.09.2018 um 16:50 schrieb Kuehling, Felix:
I
think the answer is here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hmm.rst#n216
Regards,
Felix
At
least with get_user_pages() that is perfectly
possible.
For
HMM it could be that this is prevented somehow.
>
In this case you can end up accessing pages which
are invalidated while get_user_pages is in
process.
What’s
the sequence of events you have in mind? Something
like this?
- Page
table is updated and triggers MMU notifier
- amdgpu
MMU notifier runs and waits for pending CS to
finish while holding the read lock
- New CS
starts just after invalidate_range_start MMU
notifier finishes but before the page table update
is done
- get_user_pages
returns outdated physical addresses
I
hope that’s not actually possible and that
get_user_pages or hmm_vma_fault would block until
the page table update is done. That is,
invalidate_range_start marks the start of a page
table update, and while that update is in
progress, get_user_pages or hmm_vma_fault block.
Jerome, can you comment on that?
Thanks,
Felix
Yeah
I understand that, but again that won't work.
In this case you can
end up accessing pages which are invalidated
while get_user_pages is in process.
> I’m not planning
to change that. I don’t think there is any
need to change it.
>
> Yeah, but when HMM doesn't provide both the
start and the end hock of the invalidation this
way won't work any more.
>
> So we need to find a solution for this,
> Christian.
My whole argument is that you
don’t need to hold the read lock until the
invalidate_range_end. Just read_lock and
read_unlock in the invalidate_range_start
function.
Regards,
Felix
Am 27.09.2018 um 15:18
schrieb Kuehling, Felix:
> The problem is here:
>
>
ttm_eu_fence_buffer_objects(&p->ticket,
&p->validated, p->fence);
>
amdgpu_mn_unlock(p->mn);
>
> We need to hold the lock until the fence
is added to the reservation object.
>
> Otherwise somebody could have changed the
page tables just in the moment between the
check of amdgpu_ttm_tt_userptr_needs_pages()
and adding the fence to the reservation
object.
I’m not planning to
change that. I don’t think there is any need
to change it.
Yeah, but when HMM doesn't provide both the
start and the end hock of the invalidation this
way won't work any more.
So we need to find a solution for this,
Christian.
Regards,
Felix
Am 27.09.2018 um 13:08
schrieb Kuehling, Felix:
> We double check that
there wasn't any page table modification
while we prepared the submission and restart
the whole process when there actually was
some update.
>
> The reason why we need to do this is
here:
>
>
ttm_eu_fence_buffer_objects(&p->ticket,
&p->validated, p->fence);
> amdgpu_mn_unlock(p->mn);
>
> Only after the new fence is added to
the buffer object we can release the lock so
that any invalidation will now block on our
command submission to finish before it
modifies the page table.
I don’t see why
this requires holding the read-lock until
invalidate_range_end.
amdgpu_ttm_tt_affect_userptr gets called
while the mn read-lock is held in
invalidate_range_start notifier.
That's not related to
amdgpu_ttm_tt_affect_userptr(), this function
could actually be called outside the lock.
The problem is here:
ttm_eu_fence_buffer_objects(&p->ticket,
&p->validated, p->fence);
amdgpu_mn_unlock(p->mn);
We need to hold the lock until the fence is
added to the reservation object.
Otherwise somebody could have changed the page
tables just in the moment between the check of
amdgpu_ttm_tt_userptr_needs_pages() and adding
the fence to the reservation object.
Regards,
Christian.
Regards,
Felix
That is
correct, but take a look what we do when
after calling the amdgpu_mn_read_lock():
/* No memory
allocation is allowed while holding the
mn lock */
amdgpu_mn_lock(p->mn);
amdgpu_bo_list_for_each_userptr_entry(e,
p->bo_list) {
struct amdgpu_bo *bo =
ttm_to_amdgpu_bo(e->tv.bo);
if
(amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm))
{
r =
-ERESTARTSYS;
goto
error_abort;
}
}
We double check that there wasn't any page
table modification while we prepared the
submission and restart the whole process
when there actually was some update.
The reason why we need to do this is here:
ttm_eu_fence_buffer_objects(&p->ticket,
&p->validated, p->fence);
amdgpu_mn_unlock(p->mn);
Only after the new fence is added to the
buffer object we can release the lock so
that any invalidation will now block on
our command submission to finish before it
modifies the page table.
The only other option would be to add the
fence first and then check if there was
any update to the page tables.
The issue with that approach is that
adding a fence can't be made undone, so if
we find that there actually was an update
to the page tables we would need to
somehow turn the CS into a dummy (e.g.
overwrite all IBs with NOPs or something
like that) and still submit it.
Not sure if that is actually possible.
Regards,
Christian.
Am 27.09.2018 um 10:47 schrieb Kuehling,
Felix:
So back to my previous
question:
>> But do we
really need another lock for this?
Wouldn't the
>>
re-validation of userptr BOs (currently
calling get_user_pages) force
>>
synchronization with the ongoing page
table invalidation through the
>> mmap_sem or
other MM locks?
>
> No and yes. We
don't hold any other locks while doing
command submission, but I expect that HMM
has its own mechanism to prevent that.
>
> Since we don't
modify amdgpu_mn_lock()/amdgpu_mn_unlock()
we are certainly not using this mechanism
correctly.
The existing
amdgpu_mn_lock/unlock should block the MMU
notifier while a command submission is in
progress. It should also block command
submission while an MMU notifier is in
progress.
What we lose with HMM
is the ability to hold a read-lock for the
entire duration of the
invalidate_range_start until
invalidate_range_end. As I understand it,
that lock is meant to prevent new command
submissions while the page tables are
being updated by the kernel. But my point
is, that get_user_pages or hmm_vma_fault
should do the same kind of thing. Before
the command submission can go ahead, it
needs to update the userptr addresses. If
the page tables are still being updated,
it will block there even without holding
the amdgpu_mn_read_lock.
Regards,
Felix
No, that won't
work. We would still run into lock
inversion problems.
What we could do
with the scheduler is to turn
submissions into dummies if we find
that the page tables are now
outdated.
But that would be
really hacky and I'm not sure if
that would really work in all cases.
I had a chat with
Jerome yesterday. He pointed out that
the new blockable parameter can be used
to infer whether the MMU notifier is
being called in a reclaim operation. So
if blockable==true, it should even be
safe to take the BO reservation lock
without problems. I think with that we
should be able to remove the read-write
locking completely and go back to
locking (or try-locking for
blockable==false) the reservation locks
in the MMU notifier?
Regards,
Felix
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx>
On Behalf Of Christian König
Sent: Saturday, September 15, 2018 3:47
AM
To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>;
Yang, Philip <Philip.Yang@xxxxxxx>;
amd-gfx@xxxxxxxxxxxxxxxxxxxxx;
Jerome Glisse <j.glisse@xxxxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: use HMM
mirror callback to replace mmu notifier
v4
Am 14.09.2018 um 22:21 schrieb Felix
Kuehling:
> On 2018-09-14 01:52 PM, Christian
König wrote:
>> Am 14.09.2018 um 19:47 schrieb
Philip Yang:
>>> On 2018-09-14 03:51 AM,
Christian König wrote:
>>>> Am 13.09.2018 um 23:51
schrieb Felix Kuehling:
>>>>> On 2018-09-13 04:52
PM, Philip Yang wrote:
>>>>> [SNIP]
>>>>>> +
amdgpu_mn_read_unlock(amn);
>>>>>> +
>>>>>
amdgpu_mn_read_lock/unlock support
recursive locking for multiple
>>>>> overlapping or
nested invalidation ranges. But if you'r
locking
>>>>> and unlocking in
the same function. Is that still a
concern?
>>> I don't understand the
possible recursive case, but
>>> amdgpu_mn_read_lock() still
support recursive locking.
>>>> Well the real problem
is that unlocking them here won't work.
>>>>
>>>> We need to hold the
lock until we are sure that the
operation which
>>>> updates the page tables
is completed.
>>>>
>>> The reason for this change
is because hmm mirror has
>>> invalidate_start callback,
no invalidate_end callback
>>>
>>> Check mmu_notifier.c and
hmm.c again, below is entire logic to
>>> update CPU page tables and
callback:
>>>
>>> mn lock amn->lock is
used to protect interval tree access
because
>>> user may submit/register
new userptr anytime.
>>> This is same for old and
new way.
>>>
>>> step 2 guarantee the GPU
operation is done before updating CPU
page
>>> table.
>>>
>>> So I think the change is
safe. We don't need hold mn lock until
the
>>> CPU page tables update is
completed.
>> No, that isn't even remotely
correct. The lock doesn't protects the
>> interval tree.
>>
>>> Old:
>>> 1.
down_read_non_owner(&amn->lock)
>>> 2. loop to handle BOs
from node->bos through interval tree
>>> amn->object nodes
>>> gfx: wait for
pending BOs fence operation done, mark
user
>>> pages dirty
>>> kfd: evict user
queues of the process, wait for queue
>>> unmap/map operation done
>>> 3. update CPU page
tables
>>> 4.
up_read(&amn->lock)
>>>
>>> New, switch step 3 and 4
>>> 1.
down_read_non_owner(&amn->lock)
>>> 2. loop to handle BOs
from node->bos through interval tree
>>> amn->object nodes
>>> gfx: wait for
pending BOs fence operation done, mark
user
>>> pages dirty
>>> kfd: evict user
queues of the process, wait for queue
>>> unmap/map operation done
>>> 3.
up_read(&amn->lock)
>>> 4. update CPU page
tables
>> The lock is there to make sure
that we serialize page table updates
>> with command submission.
> As I understand it, the idea is to
prevent command submission (adding
> new fences to BOs) while a page
table invalidation is in progress.
Yes, exactly.
> But do we really need another lock
for this? Wouldn't the
> re-validation of userptr BOs
(currently calling get_user_pages) force
> synchronization with the ongoing
page table invalidation through the
> mmap_sem or other MM locks?
No and yes. We don't hold any other
locks while doing command submission,
but I expect that HMM has its own
mechanism to prevent that.
Since we don't modify
amdgpu_mn_lock()/amdgpu_mn_unlock() we
are certainly not using this mechanism
correctly.
Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
|