Am 2021-03-25 um 12:10 p.m. schrieb Christian König: > > > Am 25.03.21 um 17:03 schrieb Felix Kuehling: >> Hi, >> >> This is a long one with a proposal for a pretty significant redesign of >> how we handle migrations and VRAM management with HMM. This is based on >> my own debugging and reading of the migrate_vma helpers, as well as >> Alex's problems with migrations on A+A. I hope we can discuss this next >> Monday after you've had some time do digest it. >> >> I did some debugging yesterday and found that migrations to VRAM can >> fail for some pages. The current migration helpers have many corner >> cases where a page cannot be migrated. Some of them may be fixable >> (adding support for THP), others are not (locked pages are skipped to >> avoid deadlocks). Therefore I think our current code is too inflexible >> when it assumes that a range is entirely in one place. >> >> Alex also ran into some funny issues with COW on A+A where some pages >> get faulted back to system memory. I think a lot of the problems here >> will get easier once we support mixed mappings. >> >> Mixed GPU mappings >> =========== >> >> The idea is, to remove any assumptions that an entire svm_range is in >> one place. Instead hmm_range_fault gives us a list of pages, some of >> which are system memory and others are device_private or device_generic. >> >> We will need an amdgpu_vm interface that lets us map mixed page arrays >> where different pages use different PTE flags. We can have at least 3 >> different types of pages in one mapping: >> >> 1. System memory (S-bit set) >> 2. Local memory (S-bit cleared, MTYPE for local memory) >> 3. Remote XGMI memory (S-bit cleared, MTYPE+C for remote memory) >> >> My idea is to change the amdgpu_vm_update_mapping interface to use some >> high-bit in the pages_addr array to indicate the page type. For the >> default page type (0) nothing really changes for the callers. The >> "flags" parameter needs to become a pointer to an array that gets >> indexed by the high bits from the pages_addr array. For existing callers >> it's as easy as changing flags to &flags (array of size 1). For HMM we >> would pass a pointer to a real array. > > Yeah, I've thought about stuff like that as well for a while. > > Problem is that this won't work that easily. We assume at many places > that the flags doesn't change for the range in question. I think some lower level functions assume that the flags stay the same for physically contiguous ranges. But if you use the high-bits to encode the page type, the ranges won't be contiguous any more. So you can change page flags for different contiguous ranges. Regards, Felix > > We would somehow need to change that to get the flags directly from > the low bits of the DMA address or something instead. > > Christian. > >> >> Once this is done, it leads to a number of opportunities for >> simplification and better efficiency in kfd_svm: >> >> * Support migration when cpages != npages >> * Migrate a part of an svm_range without splitting it. No more >> splitting of ranges in CPU page faults >> * Migrate a part of an svm_range in GPU page fault handler. No more >> migrating the whole range for a single page fault >> * Simplified VRAM management (see below) >> >> With that, svm_range will no longer have an "actual_loc" field. If we're >> not sure where the data is, we need to call migrate. If it's already in >> the right place, then cpages will be 0 and we can skip all the steps >> after migrate_vma_setup. >> >> Simplified VRAM management >> ============== >> >> VRAM BOs are no longer associated with pranges. Instead they are >> "free-floating", allocated during migration to VRAM, with reference >> count for each page that uses the BO. Ref is released in page-release >> callback. When the ref count drops to 0, free the BO. >> >> VRAM BO size should match the migration granularity, 2MB by default. >> That way the BO can be freed when memory gets migrated out. If migration >> of some pages fails the BO may not be fully occupied. Also some pages >> may be released individually on A+A due to COW or other events. >> >> Eviction needs to migrate all the pages still using the BO. If the BO >> struct keeps an array of page pointers, that's basically the migrate.src >> for the eviction. Migration calls "try_to_unmap", which has the best >> chance of freeing the BO, even when shared by multiple processes. >> >> If we cannot guarantee eviction of pages, we cannot use TTM for VRAM >> allocations. Need to use amdgpu_vram_mgr. Need a way to detect memory >> pressure so we can start evicting memory. >> >> Regards, >> Felix >> > _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx