On Tue, Aug 27, 2024 at 07:48:38PM -0700, Matthew Brost wrote: > +int drm_gpusvm_migrate_to_sram(struct drm_gpusvm *gpusvm, > + struct drm_gpusvm_range *range, > + const struct drm_gpusvm_ctx *ctx) > +{ > + u64 start = range->va.start, end = range->va.end; > + struct mm_struct *mm = gpusvm->mm; > + struct vm_area_struct *vas; > + int err; > + bool retry = false; > + > + if (!ctx->mmap_locked) { > + if (!mmget_not_zero(mm)) { > + err = -EFAULT; > + goto err_out; > + } > + if (ctx->trylock_mmap) { > + if (!mmap_read_trylock(mm)) { > + err = drm_gpusvm_evict_to_sram(gpusvm, range); > + goto err_mmput; > + } > + } else { > + mmap_read_lock(mm); > + } > + } > + > + mmap_assert_locked(mm); > + > + /* > + * Loop required to find all VMA area structs for the corner case when > + * VRAM backing has been partially unmapped from MM's address space. > + */ > +again: > + vas = find_vma(mm, start); So a hiliarous case that amdkfd gets a bit better but still not entirely is that the original vma might entirely gone. Even when you can still get at the mm of that process. This happens with cow (or shared too I think) mappings in forked child processes, or also if you play fun mremap games. I think that outside of the ->migrate_to_ram callback migration/eviction to sram cannot assume there's any reasonable vma around and has to unconditionally go with the drm_gpusvm_evict_to_sram path. Also in the migrate_to_ram case the vma is essentially nothing else that informational about which ranges we might need if we prefault a bit (in case the child changed the vma compared to the original one). So it's good to as parameter for migrate_vma_setup, but absolutely nothing else. amdkfd almost gets this right by being entirely based on their svm_range structures, except they still have the lingering check that the orignal mm is still alive. Of course you cannot ever use that memory on the gpu anymore, but the child process could get very pissed if their memory is suddenly gone. Also the eviction code has the same issue as yours and limits itself to vma that still exist in the original mm, leaving anything that's orphaned in children or remaps stuck in vram. At least that's my understanding, I might very well be wrong. So probably want a bunch of these testcases too to make sure that all works, and we're not stuck with memory allocations in vram that we can't move out. -Sima -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch