(sorry for a very late reply) On Thu, Jul 01, 2021 at 11:29:50AM -0700, Linus Torvalds wrote: > On Wed, Jun 30, 2021 at 6:27 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > On Wed, Jun 30, 2021 at 11:03:25AM -0700, Linus Torvalds wrote: > > > > > > What about that page_count() test, for example: it has a comment, it > > > looks obvious, but it's very different from what do_wp_page() does. So > > > what happens if we have a page-out at the same time that turns that > > > page into a swap cache page, and increments the page count? What about > > > that race? Do we end up with a writable page that is shared with a > > > swap cache entry? Is that ok? Why isn't it ok in the page fault case? > > > > That looks fine to me: when the race happens we must have checked page_count==1 > > first and granted the write bit, then add_to_swap_cache() happens after the > > page_count==1 check (as it takes another refcount, so >2 otherwise). Then it > > also means unmap mappings should happen even after that point. If my above > > understanding is correct, our newly installed pte will be zapped safely soon, > > but of course after we release the pgtable lock in change_pte_range(). > > So if this is fine, then maybe we should just remove the page lock in > the do_wp_page() path (and remove the PageKSM check while at it)? I could be wrong, but I thought the page lock in do_wp_page() is more for the PageKsm() race (e.g., to make sure we don't grant write to a page that is becoming a ksm page in parallel). > > If it's not required by mprotect() to say "I can make the page > writable directly", then it really shouldn't be required by the page > fault path either. > > Which I'd love to do, and was really itching to do (it's a nasty > lock), but I worried about it.. > > I'd hate to have mprotect do one thing, and page faulting do another > thing, and not have some logic to why they have to be different. Agreed; perhaps no need to be identical - I think the mprotect path can be even stricter than the fault page, as it's a fast-path only. It should never apply the write bit when the page fault path won't. So I think the original patch does need a justification on why it didn't handle ksm page while do_wp_page handled it. Thanks, -- Peter Xu