On 21.06.22 18:08, Sierra Guiza, Alejandro (Alex) wrote: > > On 6/21/2022 7:25 AM, David Hildenbrand wrote: >> On 21.06.22 13:55, Alistair Popple wrote: >>> David Hildenbrand<david@xxxxxxxxxx> writes: >>> >>>> On 21.06.22 13:25, Felix Kuehling wrote: >>>>> Am 6/17/22 um 23:19 schrieb David Hildenbrand: >>>>>> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: >>>>>>> On 6/17/2022 12:33 PM, David Hildenbrand wrote: >>>>>>>> On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: >>>>>>>>> On 6/17/2022 4:40 AM, David Hildenbrand wrote: >>>>>>>>>> On 31.05.22 22:00, Alex Sierra wrote: >>>>>>>>>>> Device memory that is cache coherent from device and CPU point of view. >>>>>>>>>>> This is used on platforms that have an advanced system bus (like CAPI >>>>>>>>>>> or CXL). Any page of a process can be migrated to such memory. However, >>>>>>>>>>> no one should be allowed to pin such memory so that it can always be >>>>>>>>>>> evicted. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Alex Sierra<alex.sierra@xxxxxxx> >>>>>>>>>>> Acked-by: Felix Kuehling<Felix.Kuehling@xxxxxxx> >>>>>>>>>>> Reviewed-by: Alistair Popple<apopple@xxxxxxxxxx> >>>>>>>>>>> [hch: rebased ontop of the refcount changes, >>>>>>>>>>> removed is_dev_private_or_coherent_page] >>>>>>>>>>> Signed-off-by: Christoph Hellwig<hch@xxxxxx> >>>>>>>>>>> --- >>>>>>>>>>> include/linux/memremap.h | 19 +++++++++++++++++++ >>>>>>>>>>> mm/memcontrol.c | 7 ++++--- >>>>>>>>>>> mm/memory-failure.c | 8 ++++++-- >>>>>>>>>>> mm/memremap.c | 10 ++++++++++ >>>>>>>>>>> mm/migrate_device.c | 16 +++++++--------- >>>>>>>>>>> mm/rmap.c | 5 +++-- >>>>>>>>>>> 6 files changed, 49 insertions(+), 16 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >>>>>>>>>>> index 8af304f6b504..9f752ebed613 100644 >>>>>>>>>>> --- a/include/linux/memremap.h >>>>>>>>>>> +++ b/include/linux/memremap.h >>>>>>>>>>> @@ -41,6 +41,13 @@ struct vmem_altmap { >>>>>>>>>>> * A more complete discussion of unaddressable memory may be found in >>>>>>>>>>> * include/linux/hmm.h and Documentation/vm/hmm.rst. >>>>>>>>>>> * >>>>>>>>>>> + * MEMORY_DEVICE_COHERENT: >>>>>>>>>>> + * Device memory that is cache coherent from device and CPU point of view. This >>>>>>>>>>> + * is used on platforms that have an advanced system bus (like CAPI or CXL). A >>>>>>>>>>> + * driver can hotplug the device memory using ZONE_DEVICE and with that memory >>>>>>>>>>> + * type. Any page of a process can be migrated to such memory. However no one >>>>>>>>>> Any page might not be right, I'm pretty sure. ... just thinking about special pages >>>>>>>>>> like vdso, shared zeropage, ... pinned pages ... >>>>>>>> Well, you cannot migrate long term pages, that's what I meant :) >>>>>>>> >>>>>>>>>>> + * should be allowed to pin such memory so that it can always be evicted. >>>>>>>>>>> + * >>>>>>>>>>> * MEMORY_DEVICE_FS_DAX: >>>>>>>>>>> * Host memory that has similar access semantics as System RAM i.e. DMA >>>>>>>>>>> * coherent and supports page pinning. In support of coordinating page >>>>>>>>>>> @@ -61,6 +68,7 @@ struct vmem_altmap { >>>>>>>>>>> enum memory_type { >>>>>>>>>>> /* 0 is reserved to catch uninitialized type fields */ >>>>>>>>>>> MEMORY_DEVICE_PRIVATE = 1, >>>>>>>>>>> + MEMORY_DEVICE_COHERENT, >>>>>>>>>>> MEMORY_DEVICE_FS_DAX, >>>>>>>>>>> MEMORY_DEVICE_GENERIC, >>>>>>>>>>> MEMORY_DEVICE_PCI_P2PDMA, >>>>>>>>>>> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) >>>>>>>>>> In general, this LGTM, and it should be correct with PageAnonExclusive I think. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> However, where exactly is pinning forbidden? >>>>>>>>> Long-term pinning is forbidden since it would interfere with the device >>>>>>>>> memory manager owning the >>>>>>>>> device-coherent pages (e.g. evictions in TTM). However, normal pinning >>>>>>>>> is allowed on this device type. >>>>>>>> I don't see updates to folio_is_pinnable() in this patch. >>>>>>> Device coherent type pages should return true here, as they are pinnable >>>>>>> pages. >>>>>> That function is only called for long-term pinnings in try_grab_folio(). >>>>>> >>>>>>>> So wouldn't try_grab_folio() simply pin these pages? What am I missing? >>>>>>> As far as I understand this return NULL for long term pin pages. >>>>>>> Otherwise they get refcount incremented. >>>>>> I don't follow. >>>>>> >>>>>> You're saying >>>>>> >>>>>> a) folio_is_pinnable() returns true for device coherent pages >>>>>> >>>>>> and that >>>>>> >>>>>> b) device coherent pages don't get long-term pinned >>>>>> >>>>>> >>>>>> Yet, the code says >>>>>> >>>>>> struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) >>>>>> { >>>>>> if (flags & FOLL_GET) >>>>>> return try_get_folio(page, refs); >>>>>> else if (flags & FOLL_PIN) { >>>>>> struct folio *folio; >>>>>> >>>>>> /* >>>>>> * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a >>>>>> * right zone, so fail and let the caller fall back to the slow >>>>>> * path. >>>>>> */ >>>>>> if (unlikely((flags & FOLL_LONGTERM) && >>>>>> !is_pinnable_page(page))) >>>>>> return NULL; >>>>>> ... >>>>>> return folio; >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> What prevents these pages from getting long-term pinned as stated in this patch? >>>>> Long-term pinning is handled by __gup_longterm_locked, which migrates >>>>> pages returned by __get_user_pages_locked that cannot be long-term >>>>> pinned. try_grab_folio is OK to grab the pages. Anything that can't be >>>>> long-term pinned will be migrated afterwards, and >>>>> __get_user_pages_locked will be retried. The migration of >>>>> DEVICE_COHERENT pages was implemented by Alistair in patch 5/13 >>>>> ("mm/gup: migrate device coherent pages when pinning instead of failing"). >>>> Thanks. >>>> >>>> __gup_longterm_locked()->check_and_migrate_movable_pages() >>>> >>>> Which checks folio_is_pinnable() and doesn't do anything if set. >>>> >>>> Sorry to be dense here, but I don't see how what's stated in this patch >>>> works without adjusting folio_is_pinnable(). >>> Ugh, I think you might be right about try_grab_folio(). >>> >>> We didn't update folio_is_pinnable() to include device coherent pages >>> because device coherent pages are pinnable. It is really just >>> FOLL_LONGTERM that we want to prevent here. >>> >>> For normal PUP that is done by my change in >>> check_and_migrate_movable_pages() which migrates pages being pinned with >>> FOLL_LONGTERM. But I think I incorrectly assumed we would take the >>> pte_devmap() path in gup_pte_range(), which we don't for coherent pages. >>> So I think the check in try_grab_folio() needs to be: >> I think I said it already (and I might be wrong without reading the >> code), but folio_is_pinnable() is *only* called for long-term pinnings. >> >> It should actually be called folio_is_longterm_pinnable(). >> >> That's where that check should go, no? > > David, I think you're right. We didn't catch this since the LONGTERM gup > test we added to hmm-test only calls to pin_user_pages. Apparently > try_grab_folio is called only from fast callers (ex. > pin_user_pages_fast/get_user_pages_fast). I have added a conditional > similar to what Alistair has proposed to return null on LONGTERM && > (coherent_pages || folio_is_pinnable) at try_grab_folio. Also a new gup > test was added with LONGTERM set that calls pin_user_pages_fast. > Returning null under this condition it does causes the migration from > dev to system memory. > Why can't coherent memory simply put its checks into folio_is_pinnable()? I don't get it why we have to do things differently here. > Actually, Im having different problems with a call to PageAnonExclusive > from try_to_migrate_one during page fault from a HMM test that first > migrate pages to device private and forks to mark as COW these pages. > Apparently is catching the first BUG VM_BUG_ON_PGFLAGS(!PageAnon(page), > page) With or without this series? A backtrace would be great. -- Thanks, David / dhildenb