On Mon, Feb 6, 2023 at 2:15 AM Christian König <christian.koenig@xxxxxxx> wrote: > > Am 03.02.23 um 19:10 schrieb Rob Clark: > > From: Rob Clark <robdclark@xxxxxxxxxxxx> > > > > If userspace calls the AMDGPU_CS ioctl from multiple threads, because > > the vm is global to the drm_file, you can end up with multiple threads > > racing in amdgpu_vm_clear_freed(). So the freed list should be > > protected with the status_lock, similar to other vm lists. > > Well this is nonsense. To process the freed list the VM root PD lock > must be held anyway. > > If we have a call path where this isn't true then we have a major bug at > a different place here. I'm not super familiar w/ the amdgpu cs parser stuff, but the only thing that I'm seeing that protects things is the bo_list_mutex and it isn't clear to me that this is 1:1 with the vm (it looks like it is not). (I cc'd you on the bug report, jfyi) BR, -R > > Regards, > Christian. > > > > > Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") > > Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 33 ++++++++++++++++++++++---- > > 1 file changed, 29 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > index b9441ab457ea..aeed7bc1512f 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > @@ -1240,10 +1240,19 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, > > struct amdgpu_bo_va_mapping *mapping; > > uint64_t init_pte_value = 0; > > struct dma_fence *f = NULL; > > + struct list_head freed; > > int r; > > > > - while (!list_empty(&vm->freed)) { > > - mapping = list_first_entry(&vm->freed, > > + /* > > + * Move the contents of the VM's freed list to a local list > > + * that we can iterate without racing against other threads: > > + */ > > + spin_lock(&vm->status_lock); > > + list_replace_init(&vm->freed, &freed); > > + spin_unlock(&vm->status_lock); > > + > > + while (!list_empty(&freed)) { > > + mapping = list_first_entry(&freed, > > struct amdgpu_bo_va_mapping, list); > > list_del(&mapping->list); > > > > @@ -1258,6 +1267,15 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, > > amdgpu_vm_free_mapping(adev, vm, mapping, f); > > if (r) { > > dma_fence_put(f); > > + > > + /* > > + * Move any unprocessed mappings back to the freed > > + * list: > > + */ > > + spin_lock(&vm->status_lock); > > + list_splice_tail(&freed, &vm->freed); > > + spin_unlock(&vm->status_lock); > > + > > return r; > > } > > } > > @@ -1583,11 +1601,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, > > mapping->bo_va = NULL; > > trace_amdgpu_vm_bo_unmap(bo_va, mapping); > > > > - if (valid) > > + if (valid) { > > + spin_lock(&vm->status_lock); > > list_add(&mapping->list, &vm->freed); > > - else > > + spin_unlock(&vm->status_lock); > > + } else { > > amdgpu_vm_free_mapping(adev, vm, mapping, > > bo_va->last_pt_update); > > + } > > > > return 0; > > } > > @@ -1671,7 +1692,9 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, > > tmp->last = eaddr; > > > > tmp->bo_va = NULL; > > + spin_lock(&vm->status_lock); > > list_add(&tmp->list, &vm->freed); > > + spin_unlock(&vm->status_lock); > > trace_amdgpu_vm_bo_unmap(NULL, tmp); > > } > > > > @@ -1788,7 +1811,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, > > amdgpu_vm_it_remove(mapping, &vm->va); > > mapping->bo_va = NULL; > > trace_amdgpu_vm_bo_unmap(bo_va, mapping); > > + spin_lock(&vm->status_lock); > > list_add(&mapping->list, &vm->freed); > > + spin_unlock(&vm->status_lock); > > } > > list_for_each_entry_safe(mapping, next, &bo_va->invalids, list) { > > list_del(&mapping->list); >