Re: Folio mapcount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 07, 2023 at 04:35:30PM -0800, James Houghton wrote:
> On Tue, Feb 7, 2023 at 3:35 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Feb 07, 2023 at 03:27:07PM -0800, James Houghton wrote:
> > > So page_vma_mapped_walk() might have to walk up to HPAGE_PMD_NR-ish
> > > PTEs (if we find a bunch of pte_none() PTEs). Just curious, could that
> > > be any slower than what we currently do (like, incrementing up to
> > > HPAGE_PMD_NR-ish subpage mapcounts)? Or is it not a concern?
> >
> > I think it's faster.  Both of these operations work on folio_nr_pages()
> > entries ... but a page table is 8 bytes and a struct page is 64 bytes.
> > From a CPU prefetching point of view, they're both linear scans, but
> > PTEs are 8 times denser.
> 
> >
> > The other factor to consider is how often we do each of these operations.
> > Mapping a folio happens ~once per call to mmap() (even though it's delayed
> > until page fault time).  Querying folio_total_mapcount() happens ... less
> > often, I think?  Both are going to be quite rare since generally we map
> > the entire folio at once.
> 
> Maybe this is a case where we would see a regression: doing PAGE_SIZE
> UFFDIO_CONTINUEs on a THP. Worst case, go from the end of the THP to
> the beginning (ending up with a PTE-mapped THP at the end).
> 
> For the i'th PTE we map / i'th UFFDIO_CONTINUE, we have to check
> `folio_nr_pages() - i` PTEs (for most of the iterations anyway). Seems
> like this scales with the square of the size of the folio, so this
> approach would be kind of a non-starter for HugeTLB (with
> high-granularity mapping), I think.
> 
> This example isn't completely contrived: if we did post-copy live
> migration with userfaultfd, we might end up doing something like this.
> I'm curious what you think. :)

I think that's a great corner-case to consider.  For hugetlb pages,
we know they're PMD/PUD aligned, so _if_ there's a page table present,
at least one page from the folio is already mapped, and we don't need
to look in the page table to find which one.  Similarly, if the folio
is going to occupy the entire PMD/PUD if it's mapped in part, we don't
need to iterate within it.  And contrariwise, if it's p*d_none(), then
definitely none of the pages are mapped.

That perhaps calls for using a different implementation than
page_vma_mapped_walk(), which should be worth it to optimise this case.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux