Re: A mapcount riddle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 25, 2023 at 8:02 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Tue, Jan 24, 2023 at 03:29:53PM -0800, Yang Shi wrote:
> > On Tue, Jan 24, 2023 at 3:00 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Jan 24, 2023 at 12:56:24PM -0800, Mike Kravetz wrote:
> > > > Q How can a page be mapped into multiple processes and have a
> > > >   mapcount of 1?
> > > >
> > > > A It is a hugetlb page referenced by a shared PMD.
> > > >
> > > > I was looking to expose some basic information about PMD sharing via
> > > > /proc/smaps.  After adding the code, I started a couple processes
> > > > sharing a large hugetlb mapping that would result in the use of
> > > > shared PMDs.  When I looked at the output of /proc/smaps, I saw
> > > > my new metric counting the number of shared PMDs.  However, what
> > > > stood out was that the entire mapping was listed as Private_Hugetlb.
> > > > WTH???  It certainly was shared!  The routine smaps_hugetlb_range
> > > > decides between Private_Hugetlb and Shared_Hugetlb with this code:
> > > >
> > > >       if (page) {
> > > >               int mapcount = page_mapcount(page);
> > > >
> > > >               if (mapcount >= 2)
> > > >                       mss->shared_hugetlb += huge_page_size(hstate_vma(vma));
> > > >               else
> > > >                       mss->private_hugetlb += huge_page_size(hstate_vma(vma));
> > > >       }
> > >
> > > This is definitely unfortunate..
> > >
> > > >
> > > > After spending some time looking for issues in the page_mapcount code,
> > > > I came to the realization that the mapcount of hugetlb pages only
> > > > referenced by a shared PMD would be 1 no matter how many processes had
> > > > mapped the page.  When a page is first faulted, the mapcount is set to 1.
> > > > When faulted in other processes, the shared PMD is added to the page
> > > > table of the other processes.  No increase of mapcount will occur.
> > > >
> > > > At first thought this seems bad.  However, I believe this has been the
> > > > behavior since hugetlb PMD sharing was introduced in 2006 and I am
> > > > unaware of any reported issues.  I did a audit of code looking at
> > > > mapcount.  In addition to the above issue with smaps, there appears
> > > > to be an issue with 'migrate_pages' where shared pages could be migrated
> > > > without appropriate privilege.
> > > >
> > > >       /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
> > > >       if (flags & (MPOL_MF_MOVE_ALL) ||
> > > >           (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) {
> > > >               if (isolate_hugetlb(page, qp->pagelist) &&
> > > >                       (flags & MPOL_MF_STRICT))
> > > >                       /*
> > > >                        * Failed to isolate page but allow migrating pages
> > > >                        * which have been queued.
> > > >                        */
> > > >                       ret = 1;
> > > >       }
> > > >
> > > > I will prepare fixes for both of these.  However, I wanted to ask if
> > > > anyone has ideas about other potential issues with this?
> > >
> > > This reminded me whether things should be checked already before this
> > > happens.  E.g. when trying to share pmd, whether it makes sense to check
> > > vma mempolicy before doing so?
> > >
> > > Then the question is if pmd sharing only happens with the vma that shares
> > > the same memory policy, whether above mapcount==1 check would be acceptable
> > > even if it's shared by multiple processes.
> >
> > I don't think so. One process might change its policy, for example,
> > bind to another node, then result in migration for the hugepage due to
> > the incorrect mapcount. The above example code pasted by Mike actually
> > comes from mbind if I remember correctly.
>
> Yes, or any page migrations.  Above was a purely wild idea that we share
> pmd based on vma attributes matching first (shared, mapping alignments,
> etc.).  It can also take mempolicy into account so that when migrating one
> page on the shared pmd, one can make a decision for all with mapcount==1
> because that single mapcount may stand for all the sharers of the page as
> long as they share the same mempolicy.
>
> If above idea applies, we'll also need to unshare during mbind() when the
> mempolicy of vma changes for hugetlb in this path, because right after the
> mempolicy changed the vma attribute changed, so pmd sharing doesn't hold.

Make sense to me. The vmas with different mempolicies can't be merged.
So, they shouldn't share PMD either.

>
> But please also ignore this whole thing - I don't think that'll resolve the
> generic problem of mapcount issue on pmd sharing no matter what.  It's just
> something come up to mind when I read it.
>
> >
> > I'm wondering whether we could use refcount instead of mapcount to
> > determine if hugetlb page is shared or not, assuming refcounting for
> > hugetlb page behaves similar to base page (inc when mapped by a new
> > process or pinned). If it is pinned (for example, GUP) we can't
> > migrate it either.
>
> I think refcount has the same issue because it's not accounted either when
> pmd page is shared.

Good to know that. It is a little bit counter-intuitive TBH.

>
> --
> Peter Xu
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux