On 20.01.22 16:36, Matthew Wilcox wrote: > On Thu, Jan 20, 2022 at 04:26:22PM +0100, David Hildenbrand wrote: >> On 20.01.22 15:39, Matthew Wilcox wrote: >>> On Thu, Jan 20, 2022 at 03:15:37PM +0100, David Hildenbrand wrote: >>>> On 17.01.22 14:31, zhangliang (AG) wrote: >>>>> Sure, I will do that :) >>>> >>>> I'm polishing up / testing the patches and might send something out for discussion shortly. >>>> Just a note that on my branch was a version with a wrong condition that should have been fixed now. >>>> >>>> I am still thinking about PTE mapped THP. For these, we'll always >>>> have page_count() > 1, essentially corresponding to the number of still-mapped sub-pages. >>>> >>>> So if we end up with a R/O mapped part of a THP, we'll always have to COW and cannot reuse ever, >>>> although it's really just a single process mapping the THP via PTEs. >>>> >>>> One approach would be to scan the currently locked page table for entries mapping >>>> this same page. If page_count() corresponds to that value, we know that only we are >>>> mapping the THP and there are no additional references. That would be a special case >>>> if we find an anon THP in do_wp_page(). Hm. >>> >>> You're starting to optimise for some pretty weird cases at that point. >> >> So your claim is that read-only, PTE mapped pages are weird? How do you >> come to that conclusion? > > Because normally anon THP pages are PMD mapped. That's rather > the point of anon THPs. For example unless we are talking about *drumroll* COW handling. > >> If we adjust the THP reuse logic to split on additional references >> (page_count() == 1) -- similarly as suggested by Linus to fix the CVE -- >> we're going to end up with exactly that more frequently. > > I don't understand. Are we talking past each other? As I understand > the situation we're talking about here, a process has created a THP, > done something to cause it to be partially mapped (or mapped in a > misaligned way) in its own address space, then forked, and we're > trying to figure out if it's safe to reuse it? I say that situation is > rare enough that it's OK to always allocate an order-0 page and > copy into it. Yes, we are talking past each other and no, I am talking about fully mapped THP, just mapped via PTEs. Please refer to our THP COW logic: do_huge_pmd_wp_page() > >>> Anon THP is always going to start out aligned (and can be moved by >>> mremap()). Arguably it should be broken up if it's moved so it can be >>> reformed into aligned THPs by khugepaged. >> >> Can you elaborate, I'm missing the point where something gets moved. I >> don't care about mremap() at all here. >> >> >> 1. You have a read-only, PTE mapped THP >> 2. Write fault on the THP >> 3. We PTE-map the THP because we run into a false positive in our COW >> logic to handle COW on PTE >> 4. Write fault on the PTE >> 5. We always have to COW each and every sub-page and can never reuse, >> because page_count() > 1 >> >> That's essentially what reuse_swap_page() tried to handle before. >> Eventually optimizing for this is certainly the next step, but I'd like >> to document which effect the removal of reuse_swap_page() will have to THP. > > I'm talking about step 0. How do we get a read-only, PTE-mapped THP? > Through mremap() or perhaps through an mprotect()/mmap()/munmap() that > failed to split the THP. do_huge_pmd_wp_page() -- Thanks, David / dhildenb