Re: [PATCH v2 03/10] mm/hugetlb: Document huge_pte_offset usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07.12.22 21:49, John Hubbard wrote:
On 12/7/22 12:30, Peter Xu wrote:
huge_pte_offset() is potentially a pgtable walker, looking up pte_t* for a
hugetlb address.

Normally, it's always safe to walk a generic pgtable as long as we're with
the mmap lock held for either read or write, because that guarantees the
pgtable pages will always be valid during the process.

But it's not true for hugetlbfs, especially shared: hugetlbfs can have its
pgtable freed by pmd unsharing, it means that even with mmap lock held for
current mm, the PMD pgtable page can still go away from under us if pmd
unsharing is possible during the walk.

So we have two ways to make it safe even for a shared mapping:

    (1) If we're with the hugetlb vma lock held for either read/write, it's
        okay because pmd unshare cannot happen at all.

    (2) If we're with the i_mmap_rwsem lock held for either read/write, it's
        okay because even if pmd unshare can happen, the pgtable page cannot
        be freed from under us.

Document it.

Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
---
   include/linux/hugetlb.h | 32 ++++++++++++++++++++++++++++++++
   1 file changed, 32 insertions(+)

Looks good, with a couple of minor wording tweaks below that you might
consider folding in, but either way,

Reviewed-by: John Hubbard <jhubbard@xxxxxxxxxx>


diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 551834cd5299..81efd9b9baa2 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -192,6 +192,38 @@ extern struct list_head huge_boot_pages;
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
   			unsigned long addr, unsigned long sz);
+/*
+ * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE.
+ * Returns the pte_t* if found, or NULL if the address is not mapped.
+ *
+ * Since this function will walk all the pgtable pages (including not only
+ * high-level pgtable page, but also PUD entry that can be unshared
+ * concurrently for VM_SHARED), the caller of this function should be
+ * responsible of its thread safety.  One can follow this rule:

       "responsible for"

+ *
+ *  (1) For private mappings: pmd unsharing is not possible, so it'll
+ *      always be safe if we're with the mmap sem for either read or write.

mmap sem is sooo two years ago! :)

+ *      This is normally always the case, IOW we don't need to do anything

"normally always" hurts my sense of logic. And "IOW" is for typing very quickly
in chats or email, not for long term documentation that is written rarely
and read many times.

+ *      special.

So putting all that together, maybe:

   *  (1) For private mappings: pmd unsharing is not possible, so holding the
   *      mmap_lock for either read or write is sufficient. Most callers already
   *      hold the mmap_lock, so normally, no special action is required.

With that,

Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>

--
Thanks,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux