On Thu, Sep 28, 2023 at 12:33 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > On Thu, Sep 28, 2023 at 2:05 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > On Tue, Sep 26, 2023 at 03:07:18PM -0700, Yang Shi wrote: > > > On Fri, Sep 22, 2023 at 9:33 PM Vishal Moola (Oracle) > > > <vishal.moola@xxxxxxxxx> wrote: > > > > > > > > Currently, khugepaged builds a compound_pagelist while scanning, which > > > > is used to properly account for compound pages. We can now account > > > > for a compound page as a singular folio instead, so remove this list. > > > > > > > > Large folios are guaranteed to have consecutive ptes and addresses, so > > > > once the first pte of a large folio is found skip over the rest. > > > > > > The address space may just map a partial folio, for example, in the > > > extreme case the HUGE_PMD size range may have HUGE_PMD_NR folios with > > > mapping one subpage from each folio per PTE. So assuming the PTE > > > mapped folio is mapped consecutively may be wrong. > > > > How? You can do that with two VMAs, but this is limited to scanning > > within a single VMA. If we've COWed a large folio, we currently do > > so as a single page folio, and I'm not seeing any demand to change that. > > If we did COW as a large folio, we'd COW every page in that folio. > > How do we interleave two large folios in the same VMA? > > It is not about COW. The magic from mremap() may cause some corner > cases. For example, > > We have a 2M VMA, every 4K of the VMA may be mapped to a subpage from > different folios. Like: > > 0: #0 subpage of folio #0 > 1: #1 subpage of folio #1 > 2: #2 subpage of folio #2 > .... > 511: #511 subpage of folio #511 > > When khugepaged is scanning the VMA, it may just isolate and lock the > folio #0, but skip all other folios since it assumes the VMA is just > mapped by folio #0. > > This may trigger kernel bug when unlocking other folios which are > actually not locked and maybe data corruption since the other folios > may go away under us (unisolated, unlocked and unpinned). Thanks for the review. I did not know this could happen; I'll drop this patch for now until I can think of a better way to iterate through ptes for large folios.