On 2021-04-22 9:20 a.m., Felix Kuehling
wrote:
Am 2021-04-22 um 9:08 a.m. schrieb philip yang:On 2021-04-20 9:25 p.m., Felix Kuehling wrote: @@ -2251,14 +2330,34 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,} mmap_read_lock(mm); +retry_write_locked: mutex_lock(&svms->lock); prange = svm_range_from_addr(svms, addr, NULL); if (!prange) { pr_debug("failed to find prange svms 0x%p address [0x%llx]\n", svms, addr); - r = -EFAULT; - goto out_unlock_svms; + if (!write_locked) { + /* Need the write lock to create new range with MMU notifier. + * Also flush pending deferred work to make sure the interval + * tree is up to date before we add a new range + */ + mutex_unlock(&svms->lock); + mmap_read_unlock(mm); + svm_range_list_lock_and_flush_work(svms, mm);I think this can deadlock with a deferred worker trying to drain interrupts (Philip's patch series). If we cannot flush deferred work here, we need to be more careful creating new ranges to make sure they don't conflict with added deferred or child ranges.It's impossible to have deadlock with deferred worker to drain interrupts, because drain interrupt wait for restore_pages without taking any lock, and restore_pages flush deferred work without taking any lock too.The deadlock does not come from holding or waiting for locks. It comes from the worker waiting for interrupts to drain and the interrupt handler waiting for the worker to finish with flush_work in svm_range_list_lock_and_flush_work. If both are waiting for each other, neither can make progress and you have a deadlock.
yes, you are right, I can repro the deadlock after changing the kfdtest. We cannot flush deferred work here.
Regards,
Philip
Regards, FelixRegards, PhilipRegards, Felix+ write_locked = true; + goto retry_write_locked; + } + prange = svm_range_create_unregistered_range(adev, p, mm, addr); + if (!prange) { + pr_debug("failed to create unregisterd range svms 0x%p address [0x%llx]\n", + svms, addr); + mmap_write_downgrade(mm); + r = -EFAULT; + goto out_unlock_svms; + } } + if (write_locked) + mmap_write_downgrade(mm); mutex_lock(&prange->migrate_mutex);_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx