Am 2021-03-25 um 12:16 p.m. schrieb Christian König: > Am 25.03.21 um 17:14 schrieb Felix Kuehling: >> Am 2021-03-25 um 12:10 p.m. schrieb Christian König: >>> >>> Am 25.03.21 um 17:03 schrieb Felix Kuehling: >>>> Hi, >>>> >>>> This is a long one with a proposal for a pretty significant >>>> redesign of >>>> how we handle migrations and VRAM management with HMM. This is >>>> based on >>>> my own debugging and reading of the migrate_vma helpers, as well as >>>> Alex's problems with migrations on A+A. I hope we can discuss this >>>> next >>>> Monday after you've had some time do digest it. >>>> >>>> I did some debugging yesterday and found that migrations to VRAM can >>>> fail for some pages. The current migration helpers have many corner >>>> cases where a page cannot be migrated. Some of them may be fixable >>>> (adding support for THP), others are not (locked pages are skipped to >>>> avoid deadlocks). Therefore I think our current code is too inflexible >>>> when it assumes that a range is entirely in one place. >>>> >>>> Alex also ran into some funny issues with COW on A+A where some pages >>>> get faulted back to system memory. I think a lot of the problems here >>>> will get easier once we support mixed mappings. >>>> >>>> Mixed GPU mappings >>>> =========== >>>> >>>> The idea is, to remove any assumptions that an entire svm_range is in >>>> one place. Instead hmm_range_fault gives us a list of pages, some of >>>> which are system memory and others are device_private or >>>> device_generic. >>>> >>>> We will need an amdgpu_vm interface that lets us map mixed page arrays >>>> where different pages use different PTE flags. We can have at least 3 >>>> different types of pages in one mapping: >>>> >>>> 1. System memory (S-bit set) >>>> 2. Local memory (S-bit cleared, MTYPE for local memory) >>>> 3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory) >>>> >>>> My idea is to change the amdgpu_vm_update_mapping interface to use >>>> some >>>> high-bit in the pages_addr array to indicate the page type. For the >>>> default page type (0) nothing really changes for the callers. The >>>> "flags" parameter needs to become a pointer to an array that gets >>>> indexed by the high bits from the pages_addr array. For existing >>>> callers >>>> it's as easy as changing flags to &flags (array of size 1). For HMM we >>>> would pass a pointer to a real array. >>> Yeah, I've thought about stuff like that as well for a while. >>> >>> Problem is that this won't work that easily. We assume at many places >>> that the flags doesn't change for the range in question. >> I think some lower level functions assume that the flags stay the same >> for physically contiguous ranges. But if you use the high-bits to encode >> the page type, the ranges won't be contiguous any more. So you can >> change page flags for different contiguous ranges. > > Yeah, but then you also get absolutely zero THP and fragment flags > support. As long as you have a contiguous 2MB page with the same page type, I think you can still get a THP mapping in the GPU page table. But if one page in the middle of a 2MB page has a different page type, that will break the THP mapping, as it should. Regards, Felix > > But I think we could also add those later on. > > Regards, > Christian. > >> >> Regards, >> Felix >> >> >>> We would somehow need to change that to get the flags directly from >>> the low bits of the DMA address or something instead. >>> >>> Christian. >>> >>>> Once this is done, it leads to a number of opportunities for >>>> simplification and better efficiency in kfd_svm: >>>> >>>> * Support migration when cpages != npages >>>> * Migrate a part of an svm_range without splitting it. No more >>>> splitting of ranges in CPU page faults >>>> * Migrate a part of an svm_range in GPU page fault handler. No >>>> more >>>> migrating the whole range for a single page fault >>>> * Simplified VRAM management (see below) >>>> >>>> With that, svm_range will no longer have an "actual_loc" field. If >>>> we're >>>> not sure where the data is, we need to call migrate. If it's >>>> already in >>>> the right place, then cpages will be 0 and we can skip all the steps >>>> after migrate_vma_setup. >>>> >>>> Simplified VRAM management >>>> ============== >>>> >>>> VRAM BOs are no longer associated with pranges. Instead they are >>>> "free-floating", allocated during migration to VRAM, with reference >>>> count for each page that uses the BO. Ref is released in page-release >>>> callback. When the ref count drops to 0, free the BO. >>>> >>>> VRAM BO size should match the migration granularity, 2MB by default. >>>> That way the BO can be freed when memory gets migrated out. If >>>> migration >>>> of some pages fails the BO may not be fully occupied. Also some pages >>>> may be released individually on A+A due to COW or other events. >>>> >>>> Eviction needs to migrate all the pages still using the BO. If the BO >>>> struct keeps an array of page pointers, that's basically the >>>> migrate.src >>>> for the eviction. Migration calls "try_to_unmap", which has the best >>>> chance of freeing the BO, even when shared by multiple processes. >>>> >>>> If we cannot guarantee eviction of pages, we cannot use TTM for VRAM >>>> allocations. Need to use amdgpu_vram_mgr. Need a way to detect memory >>>> pressure so we can start evicting memory. >>>> >>>> Regards, >>>> Felix >>>> > _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx