I have been looking at this in relation to the migration code and noticed we have the following in try_to_migrate(): if (is_zone_device_page(page) && !is_device_private_page(page)) return; Which if I'm understanding correctly means that migration of device coherent pages will always fail. Given that I do wonder how hmm-tests are passing, but I assume you must always be hitting this fast path in migrate_vma_collect_pmd(): /* * Optimize for the common case where page is only mapped once * in one process. If we can lock the page, then we can safely * set up a special migration page table entry now. */ Meaning that try_to_migrate() never gets called from migrate_vma_unmap(). So you will also need some changes to try_to_migrate() and possibly try_to_migrate_one() to make this reliable. - Alistair On Tuesday, 11 January 2022 9:31:51 AM AEDT Alex Sierra wrote: > This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory > owned by a device that can be mapped into CPU page tables like > MEMORY_DEVICE_GENERIC and can also be migrated like > MEMORY_DEVICE_PRIVATE. > > Christoph, the suggestion to incorporate Ralph Campbell’s refcount > cleanup patch into our hardware page migration patchset originally came > from you, but it proved impractical to do things in that order because > the refcount cleanup introduced a bug with wide ranging structural > implications. Instead, we amended Ralph’s patch so that it could be > applied after merging the migration work. As we saw from the recent > discussion, merging the refcount work is going to take some time and > cooperation between multiple development groups, while the migration > work is ready now and is needed now. So we propose to merge this > patchset first and continue to work with Ralph and others to merge the > refcount cleanup separately, when it is ready. > > This patch series is mostly self-contained except for a few places where > it needs to update other subsystems to handle the new memory type. > System stability and performance are not affected according to our > ongoing testing, including xfstests. > > How it works: The system BIOS advertises the GPU device memory > (aka VRAM) as SPM (special purpose memory) in the UEFI system address > map. > > The amdgpu driver registers the memory with devmap as > MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for > this hardware page migration capability is the Frontier supercomputer > project. This functionality is not AMD-specific. We expect other GPU > vendors to find this functionality useful, and possibly other hardware > types in the future. > > Our test nodes in the lab are similar to the Frontier configuration, > with .5 TB of system memory plus 256 GB of device memory split across > 4 GPUs, all in a single coherent address space. Page migration is > expected to improve application efficiency significantly. We will > report empirical results as they become available. > > We extended hmm_test to cover migration of MEMORY_DEVICE_COHERENT. This > patch set builds on HMM and our SVM memory manager already merged in > 5.15. > > v2: > - test_hmm is now able to create private and coherent device mirror > instances in the same driver probe. This adds more usability to the hmm > test by not having to remove the kernel module for each device type > test (private/coherent type). This is done by passing the module > parameters spm_addr_dev0 & spm_addr_dev1. In this case, it will create > four instances of device_mirror. The first two correspond to private > device type, the last two to coherent type. Then, they can be easily > accessed from user space through /dev/hmm_mirror<num_device>. Usually > num_device 0 and 1 are for private, and 2 and 3 for coherent types. > > - Coherent device type pages at gup are now migrated back to system > memory if they have been long term pinned (FOLL_LONGTERM). The reason > is these pages could eventually interfere with their own device memory > manager. A new hmm_gup_test has been added to the hmm-test to test this > functionality. It makes use of the gup_test module to long term pin > user pages that have been migrate to device memory first. > > - Other patch corrections made by Felix, Alistair and Christoph. > > v3: > - Based on last v2 feedback we got from Alistair, we've decided to > remove migration logic for FOLL_LONGTERM coherent device type pages at > gup for now. Ideally, this should be done through the kernel mm, > instead of calling the device driver to do it. Currently, there's no > support for migrating device pages based on pfn, mainly because > migrate_pages() relies on pages being LRU pages. Alistair mentioned, he > has started to work on adding this migrate device pages logic. For now, > we fail on get_user_pages call with FOLL_LONGTERM for DEVICE_COHERENT > pages. > > - Also, hmm_gup_test has been removed from hmm-test. We plan to include > it again after this migration work is ready. > > - Addressed Liam Howlett's feedback changes. > > Alex Sierra (10): > mm: add zone device coherent type memory support > mm: add device coherent vma selection for memory migration > mm/gup: fail get_user_pages for LONGTERM dev coherent type > drm/amdkfd: add SPM support for SVM > drm/amdkfd: coherent type as sys mem on migration to ram > lib: test_hmm add ioctl to get zone device type > lib: test_hmm add module param for zone device type > lib: add support for device coherent type in test_hmm > tools: update hmm-test to support device coherent type > tools: update test_hmm script to support SP config > > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 34 ++- > include/linux/memremap.h | 8 + > include/linux/migrate.h | 1 + > include/linux/mm.h | 16 ++ > lib/test_hmm.c | 333 +++++++++++++++++------ > lib/test_hmm_uapi.h | 22 +- > mm/gup.c | 7 + > mm/memcontrol.c | 6 +- > mm/memory-failure.c | 8 +- > mm/memremap.c | 5 +- > mm/migrate.c | 30 +- > tools/testing/selftests/vm/hmm-tests.c | 122 +++++++-- > tools/testing/selftests/vm/test_hmm.sh | 24 +- > 13 files changed, 475 insertions(+), 141 deletions(-) > >