On Tue, Feb 07, 2023 at 11:19:40AM -0500, Zi Yan wrote: > On 2 Feb 2023, at 10:31, Matthew Wilcox wrote: > > On Wed, Feb 01, 2023 at 07:45:17PM -0800, Mike Kravetz wrote: > >> On 01/24/23 18:13, Matthew Wilcox wrote: > >>> Once we get to the part of the folio journey where we have > >>> one-pointer-per-page, we can't afford to maintain per-page state. > >>> Currently we maintain a per-page mapcount, and that will have to go. > >>> We can maintain extra state for a multi-page folio, but it has to be a > >>> constant amount of extra state no matter how many pages are in the folio. > >>> > >>> My proposal is that we maintain a single mapcount per folio, and its > >>> definition is the number of (vma, page table) tuples which have a > >>> reference to any pages in this folio. > >> > >> Hi Matthew, finally took a look at this. Can you clarify your definition of > >> 'page table' here? I think you are talking about all the entries within > >> one page table page? Is that correct? It certainly makes sense in this > >> context. > >> > >> I have always thought of page table as the entire tree structure starting at > >> *pgd in the mm_struct. So, I was a bit confused. But, I now see elsewhere > >> that 'page table' may refer to either. > > > > Yes, we're pretty sloppy about that. What I had in mind was: > > > > We have a large folio which is mapped at, say, (1.9MB - 2.1MB) in the > > user address space. There are thus multiple PTEs which map it and some > > of those PTEs belong to one PMD and the rest belong to a second PMD. > > It has a mapcount of 2 due to being mapped by PTE entries belonging to > > two PMD tables. If it were mapped at (2.1-2.3MB), it would have a > > mapcount of 1 due to all its PTEs belonging to a single PMD table. > > What is the logic of using PMD as the basic counting unit? Why not use > PTE or PUG? I just cannot understand the goal of doing this. Locking and contiguity. If we try to map a folio across a PMD boundary, we have to have the PTL on both PMDs at the same time (or all PMDs if we support folios larger than PMD_SIZE). Then we have to make two (or more) calls to set_ptes() to populate all the PTEs (so that arches don't have to handle "Oh, I reached the end of the PMD, move to the next one"). Note that I've decided this approach doesn't work because it can't easily tell us "Am I the only VMA which has this folio mapped?" But this was the reason.