On 21.06.22 13:55, Alistair Popple wrote: > > David Hildenbrand <david@xxxxxxxxxx> writes: > >> On 21.06.22 13:25, Felix Kuehling wrote: >>> >>> Am 6/17/22 um 23:19 schrieb David Hildenbrand: >>>> On 17.06.22 21:27, Sierra Guiza, Alejandro (Alex) wrote: >>>>> On 6/17/2022 12:33 PM, David Hildenbrand wrote: >>>>>> On 17.06.22 19:20, Sierra Guiza, Alejandro (Alex) wrote: >>>>>>> On 6/17/2022 4:40 AM, David Hildenbrand wrote: >>>>>>>> On 31.05.22 22:00, Alex Sierra wrote: >>>>>>>>> Device memory that is cache coherent from device and CPU point of view. >>>>>>>>> This is used on platforms that have an advanced system bus (like CAPI >>>>>>>>> or CXL). Any page of a process can be migrated to such memory. However, >>>>>>>>> no one should be allowed to pin such memory so that it can always be >>>>>>>>> evicted. >>>>>>>>> >>>>>>>>> Signed-off-by: Alex Sierra <alex.sierra@xxxxxxx> >>>>>>>>> Acked-by: Felix Kuehling <Felix.Kuehling@xxxxxxx> >>>>>>>>> Reviewed-by: Alistair Popple <apopple@xxxxxxxxxx> >>>>>>>>> [hch: rebased ontop of the refcount changes, >>>>>>>>> removed is_dev_private_or_coherent_page] >>>>>>>>> Signed-off-by: Christoph Hellwig <hch@xxxxxx> >>>>>>>>> --- >>>>>>>>> include/linux/memremap.h | 19 +++++++++++++++++++ >>>>>>>>> mm/memcontrol.c | 7 ++++--- >>>>>>>>> mm/memory-failure.c | 8 ++++++-- >>>>>>>>> mm/memremap.c | 10 ++++++++++ >>>>>>>>> mm/migrate_device.c | 16 +++++++--------- >>>>>>>>> mm/rmap.c | 5 +++-- >>>>>>>>> 6 files changed, 49 insertions(+), 16 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >>>>>>>>> index 8af304f6b504..9f752ebed613 100644 >>>>>>>>> --- a/include/linux/memremap.h >>>>>>>>> +++ b/include/linux/memremap.h >>>>>>>>> @@ -41,6 +41,13 @@ struct vmem_altmap { >>>>>>>>> * A more complete discussion of unaddressable memory may be found in >>>>>>>>> * include/linux/hmm.h and Documentation/vm/hmm.rst. >>>>>>>>> * >>>>>>>>> + * MEMORY_DEVICE_COHERENT: >>>>>>>>> + * Device memory that is cache coherent from device and CPU point of view. This >>>>>>>>> + * is used on platforms that have an advanced system bus (like CAPI or CXL). A >>>>>>>>> + * driver can hotplug the device memory using ZONE_DEVICE and with that memory >>>>>>>>> + * type. Any page of a process can be migrated to such memory. However no one >>>>>>>> Any page might not be right, I'm pretty sure. ... just thinking about special pages >>>>>>>> like vdso, shared zeropage, ... pinned pages ... >>>>>> Well, you cannot migrate long term pages, that's what I meant :) >>>>>> >>>>>>>>> + * should be allowed to pin such memory so that it can always be evicted. >>>>>>>>> + * >>>>>>>>> * MEMORY_DEVICE_FS_DAX: >>>>>>>>> * Host memory that has similar access semantics as System RAM i.e. DMA >>>>>>>>> * coherent and supports page pinning. In support of coordinating page >>>>>>>>> @@ -61,6 +68,7 @@ struct vmem_altmap { >>>>>>>>> enum memory_type { >>>>>>>>> /* 0 is reserved to catch uninitialized type fields */ >>>>>>>>> MEMORY_DEVICE_PRIVATE = 1, >>>>>>>>> + MEMORY_DEVICE_COHERENT, >>>>>>>>> MEMORY_DEVICE_FS_DAX, >>>>>>>>> MEMORY_DEVICE_GENERIC, >>>>>>>>> MEMORY_DEVICE_PCI_P2PDMA, >>>>>>>>> @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) >>>>>>>> In general, this LGTM, and it should be correct with PageAnonExclusive I think. >>>>>>>> >>>>>>>> >>>>>>>> However, where exactly is pinning forbidden? >>>>>>> Long-term pinning is forbidden since it would interfere with the device >>>>>>> memory manager owning the >>>>>>> device-coherent pages (e.g. evictions in TTM). However, normal pinning >>>>>>> is allowed on this device type. >>>>>> I don't see updates to folio_is_pinnable() in this patch. >>>>> Device coherent type pages should return true here, as they are pinnable >>>>> pages. >>>> That function is only called for long-term pinnings in try_grab_folio(). >>>> >>>>>> So wouldn't try_grab_folio() simply pin these pages? What am I missing? >>>>> As far as I understand this return NULL for long term pin pages. >>>>> Otherwise they get refcount incremented. >>>> I don't follow. >>>> >>>> You're saying >>>> >>>> a) folio_is_pinnable() returns true for device coherent pages >>>> >>>> and that >>>> >>>> b) device coherent pages don't get long-term pinned >>>> >>>> >>>> Yet, the code says >>>> >>>> struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) >>>> { >>>> if (flags & FOLL_GET) >>>> return try_get_folio(page, refs); >>>> else if (flags & FOLL_PIN) { >>>> struct folio *folio; >>>> >>>> /* >>>> * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a >>>> * right zone, so fail and let the caller fall back to the slow >>>> * path. >>>> */ >>>> if (unlikely((flags & FOLL_LONGTERM) && >>>> !is_pinnable_page(page))) >>>> return NULL; >>>> ... >>>> return folio; >>>> } >>>> } >>>> >>>> >>>> What prevents these pages from getting long-term pinned as stated in this patch? >>> >>> Long-term pinning is handled by __gup_longterm_locked, which migrates >>> pages returned by __get_user_pages_locked that cannot be long-term >>> pinned. try_grab_folio is OK to grab the pages. Anything that can't be >>> long-term pinned will be migrated afterwards, and >>> __get_user_pages_locked will be retried. The migration of >>> DEVICE_COHERENT pages was implemented by Alistair in patch 5/13 >>> ("mm/gup: migrate device coherent pages when pinning instead of failing"). >> >> Thanks. >> >> __gup_longterm_locked()->check_and_migrate_movable_pages() >> >> Which checks folio_is_pinnable() and doesn't do anything if set. >> >> Sorry to be dense here, but I don't see how what's stated in this patch >> works without adjusting folio_is_pinnable(). > > Ugh, I think you might be right about try_grab_folio(). > > We didn't update folio_is_pinnable() to include device coherent pages > because device coherent pages are pinnable. It is really just > FOLL_LONGTERM that we want to prevent here. > > For normal PUP that is done by my change in > check_and_migrate_movable_pages() which migrates pages being pinned with > FOLL_LONGTERM. But I think I incorrectly assumed we would take the > pte_devmap() path in gup_pte_range(), which we don't for coherent pages. > So I think the check in try_grab_folio() needs to be: I think I said it already (and I might be wrong without reading the code), but folio_is_pinnable() is *only* called for long-term pinnings. It should actually be called folio_is_longterm_pinnable(). That's where that check should go, no? -- Thanks, David / dhildenb