Re: [PATCH 1/4] fs/proc/task_mmu: use folio API in pte_is_pinned()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2024/6/4 23:47, David Hildenbrand wrote:
On 04.06.24 16:26, Kefeng Wang wrote:


On 2024/6/4 19:51, David Hildenbrand wrote:
On 04.06.24 13:48, Kefeng Wang wrote:
Convert to use vm_normal_folio() and folio_maybe_dma_pinned() API,
which helps to remove page_maybe_dma_pinned() in the subsequent change.

Signed-off-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx>
---
   fs/proc/task_mmu.c | 8 ++++----
   1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f8d35f993fe5..5aceb3db7565 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1088,7 +1088,7 @@ struct clear_refs_private {
   static inline bool pte_is_pinned(struct vm_area_struct *vma,
unsigned long addr, pte_t pte)
   {
-    struct page *page;
+    struct folio *folio;
       if (!pte_write(pte))
           return false;
@@ -1096,10 +1096,10 @@ static inline bool pte_is_pinned(struct
vm_area_struct *vma, unsigned long addr,
           return false;
       if (likely(!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags)))
           return false;
-    page = vm_normal_page(vma, addr, pte);
-    if (!page)
+    folio = vm_normal_folio(vma, addr, pte);
+    if (!folio)
           return false;
-    return page_maybe_dma_pinned(page);
+    return folio_maybe_dma_pinned(folio);
   }
   static inline void clear_soft_dirty(struct vm_area_struct *vma,

Likely we should just get rid of the pte_is_pinned() check completely
now. We don't perform the same for PMDs, we don't sync against GUP-fast,

Yes, no handle for clear PMDs.

and the original COW vs. GUP issue was resolved.

Agree, but I'm not sure about removing it, from the commit 9348b73c2e1bf,

    "The whole "track page soft dirty" state doesn't work with pinned pages
    anyway, since the page might be dirtied by the pinning entity without
    ever being noticed in the page tables."

The issue is between the pin mechanism and "track page soft dirty", if
the page is pinned, the pining entiry(DMA?) could change the page but
the pte dirty won't be set, so maybe we still need it, even add some
similar thing for PMD? Correct me if I'm wrong, thanks.

Yes, but it doesn't work with any mechanism that write-protects PTEs,
including mprotect() and uffd-wp.

Then, we never synced agaisnt concurrent GUP-fast, concurrent O_DIRECT
that might still use !FOLL_PIN, never handled PMD ... so it's all not
consistent nor really helpful nowdays.

... and if you have read-only pinned pages (which we cannot distinguish)


OK, many cases need to be addressed.

I have a proper patch lying around here for quite a while:

commit 9ef578b7aba8bba626b904fe90e5be0690842fd3
Author: David Hildenbrand <david@xxxxxxxxxx>
Date:   Wed Feb 16 20:39:43 2022 +0100

    fs/proc/task_mmu: allow setting pinned pages R/O for softdirty tracking
     Before we had PG_anon_exclusive, our new COW logic would end up
     replacing a pinned page in the page tables, detecting it as possibly
     shared. Consequently, we'd lost synchronicity between the page mapped
     into the page table and the page pin.
     We tried preventing mapping pinned pages R/O, however, history told us
    that that is impossible -- and we added PG_anon_exclusive to have it all
     working reliably again.
    Now that we have PG_anon_exclusive in place, let's get rid of the check for
     pinned pages and revert it to the old state.
     Yes, we won't be able to detect the following scenario correctly:
     (1) R/W-pin a page.
     (2) Clear softdirty.
     (3) Modify the page via the R/W-pin.
     However, that isn't working reliably right now either way, because
    * The current check is racy, because we can race with GUP-fast taking a       R/W pin while we're clearing the softdirty marker and mapping the page
       R/O.
     * The current check cannot handle FOLL_GET R/W references, as used for
       O_DIRECT. So if there is concurrent I/O the PTE will get marked as
       !softdirty, yet, direct I/O will modify page content afterwards.
     * We check for pins only when handling PTEs, but not for PMDs etc.
    Also, this handling is in no way different to other mechanisms (mprotect,
     uffd-wp) that map pages R/O to catch successive write access via the
     page table -- because acceses via the page pin no longer go logically
     via the page table, the page table is bypassed.
    With this change, the interface now works again as expected when we have
     R/O pins, and behaves just like any other mechanism that uses write
    protection to catch successive writes (mprotect, uffd-wp) -- and they all
     face the same issue regarding R/W access via GUP (FOLL_PIN and
     FOLL_GET).
    User space better be aware that using read-protection to catch writes to     a page can miss writes via GUP. Softdirt tracking cannot reliably catch
     modifications via GUP after clearing softdirty and returning to user
     space.

But I understand if you want to be careful :) So I might send that patch out at
some point myself ...


Thank for your detail explanation, let's wait it out.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux