Am 14.09.2018 um 22:21 schrieb Felix Kuehling: > On 2018-09-14 01:52 PM, Christian König wrote: >> Am 14.09.2018 um 19:47 schrieb Philip Yang: >>> On 2018-09-14 03:51 AM, Christian König wrote: >>>> Am 13.09.2018 um 23:51 schrieb Felix Kuehling: >>>>> On 2018-09-13 04:52 PM, Philip Yang wrote: >>>>> [SNIP] >>>>>>  +   amdgpu_mn_read_unlock(amn); >>>>>> + >>>>> amdgpu_mn_read_lock/unlock support recursive locking for multiple >>>>> overlapping or nested invalidation ranges. But if you'r locking and >>>>> unlocking in the same function. Is that still a concern? >>> I don't understand the possible recursive case, but >>> amdgpu_mn_read_lock() still support recursive locking. >>>> Well the real problem is that unlocking them here won't work. >>>> >>>> We need to hold the lock until we are sure that the operation which >>>> updates the page tables is completed. >>>> >>> The reason for this change is because hmm mirror has invalidate_start >>> callback, no invalidate_end callback >>> >>> Check mmu_notifier.c and hmm.c again, below is entire logic to update >>> CPU page tables and callback: >>> >>> mn lock amn->lock is used to protect interval tree access because >>> user may submit/register new userptr anytime. >>> This is same for old and new way. >>> >>> step 2 guarantee the GPU operation is done before updating CPU page >>> table. >>> >>> So I think the change is safe. We don't need hold mn lock until the >>> CPU page tables update is completed. >> No, that isn't even remotely correct. The lock doesn't protects the >> interval tree. >> >>> Old: >>>   1. down_read_non_owner(&amn->lock) >>>   2. loop to handle BOs from node->bos through interval tree >>> amn->object nodes >>>       gfx: wait for pending BOs fence operation done, mark user >>> pages dirty >>>       kfd: evict user queues of the process, wait for queue >>> unmap/map operation done >>>   3. update CPU page tables >>>   4. up_read(&amn->lock) >>> >>> New, switch step 3 and 4 >>>   1. down_read_non_owner(&amn->lock) >>>   2. loop to handle BOs from node->bos through interval tree >>> amn->object nodes >>>       gfx: wait for pending BOs fence operation done, mark user >>> pages dirty >>>       kfd: evict user queues of the process, wait for queue >>> unmap/map operation done >>>   3. up_read(&amn->lock) >>>   4. update CPU page tables >> The lock is there to make sure that we serialize page table updates >> with command submission. > As I understand it, the idea is to prevent command submission (adding > new fences to BOs) while a page table invalidation is in progress. Yes, exactly. > But do we really need another lock for this? Wouldn't the re-validation of > userptr BOs (currently calling get_user_pages) force synchronization > with the ongoing page table invalidation through the mmap_sem or other > MM locks? No and yes. We don't hold any other locks while doing command submission, but I expect that HMM has its own mechanism to prevent that. Since we don't modify amdgpu_mn_lock()/amdgpu_mn_unlock() we are certainly not using this mechanism correctly. Regards, Christian.