On Wednesday, 7 July 2021 1:35:18 AM AEST Peter Xu wrote: > On Tue, Jul 06, 2021 at 03:40:42PM +1000, Alistair Popple wrote: > > > > > > > > > struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > > > > > > > > > pte_t pte); > > > > > > > > > struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, > > > > > > > > > diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h > > > > > > > > > index 355ea1ee32bd..c29a6ef3a642 100644 > > > > > > > > > --- a/include/linux/mm_inline.h > > > > > > > > > +++ b/include/linux/mm_inline.h > > > > > > > > > @@ -4,6 +4,8 @@ > > > > > > > > > > > > > > > > > > #include <linux/huge_mm.h> > > > > > > > > > #include <linux/swap.h> > > > > > > > > > +#include <linux/userfaultfd_k.h> > > > > > > > > > +#include <linux/swapops.h> > > > > > > > > > > > > > > > > > > /** > > > > > > > > > * page_is_file_lru - should the page be on a file LRU or anon LRU? > > > > > > > > > @@ -104,4 +106,45 @@ static __always_inline void del_page_from_lru_list(struct page *page, > > > > > > > > > update_lru_size(lruvec, page_lru(page), page_zonenum(page), > > > > > > > > > -thp_nr_pages(page)); > > > > > > > > > } > > > > > > > > > + > > > > > > > > > +/* > > > > > > > > > + * If this pte is wr-protected by uffd-wp in any form, arm the special pte to > > > > > > > > > + * replace a none pte. NOTE! This should only be called when *pte is already > > > > > > > > > + * cleared so we will never accidentally replace something valuable. Meanwhile > > > > > > > > > + * none pte also means we are not demoting the pte so if tlb flushed then we > > > > > > > > > + * don't need to do it again; otherwise if tlb flush is postponed then it's > > > > > > > > > + * even better. > > > > > > > > > + * > > > > > > > > > + * Must be called with pgtable lock held. > > > > > > > > > + */ > > > > > > > > > +static inline void > > > > > > > > > +pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr, > > > > > > > > > + pte_t *pte, pte_t pteval) > > > > > > > > > +{ > > > > > > > > > +#ifdef CONFIG_USERFAULTFD > > > > > > > > > + bool arm_uffd_pte = false; > > > > > > > > > + > > > > > > > > > + /* The current status of the pte should be "cleared" before calling */ > > > > > > > > > + WARN_ON_ONCE(!pte_none(*pte)); > > > > > > > > > + > > > > > > > > > + if (vma_is_anonymous(vma)) > > > > > > > > > + return; > > > > > > > > > + > > > > > > > > > + /* A uffd-wp wr-protected normal pte */ > > > > > > > > > + if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval))) > > > > > > > > > + arm_uffd_pte = true; > > > > > > > > > + > > > > > > > > > + /* > > > > > > > > > + * A uffd-wp wr-protected swap pte. Note: this should even work for > > > > > > > > > + * pte_swp_uffd_wp_special() too. > > > > > > > > > + */ > > > > > > > > > > > > > > > > I'm probably missing something but when can we actually have this case and why > > > > > > > > would we want to leave a special pte behind? From what I can tell this is > > > > > > > > called from try_to_unmap_one() where this won't be true or from zap_pte_range() > > > > > > > > when not skipping swap pages. > > > > > > > > > > > > > > Yes this is a good question.. > > > > > > > > > > > > > > Initially I made this function make sure I cover all forms of uffd-wp bit, that > > > > > > > contains both swap and present ptes; imho that's pretty safe. However for > > > > > > > !anonymous cases we don't keep swap entry inside pte even if swapped out, as > > > > > > > they should reside in shmem page cache indeed. The only missing piece seems to > > > > > > > be the device private entries as you also spotted below. > > > > > > > > > > > > Yes, I think it's *probably* safe although I don't yet have a strong opinion > > > > > > here ... > > > > > > > > > > > > > > > + if (unlikely(is_swap_pte(pteval) && pte_swp_uffd_wp(pteval))) > > > > > > > > > > > > ... however if this can never happen would a WARN_ON() be better? It would also > > > > > > mean you could remove arm_uffd_pte. > > > > > > > > > > Hmm, after a second thought I think we can't make it a WARN_ON_ONCE().. this > > > > > can still be useful for private mapping of shmem files: in that case we'll have > > > > > swap entry stored in pte not page cache, so after page reclaim it will contain > > > > > a valid swap entry, while it's still "!anonymous". > > [1] > > > > > > > > > There's something (probably obvious) I must still be missing here. During > > > > reclaim won't a private shmem mapping still have a present pteval here? > > > > Therefore it won't trigger this case - the uffd wp bit is set when the swap > > > > entry is established further down in try_to_unmap_one() right? > > > > > > I agree if it's at the point when it get reclaimed, however what if we zap a > > > pte of a page already got reclaimed? It should have the swap pte installed, > > > imho, which will have "is_swap_pte(pteval) && pte_swp_uffd_wp(pteval)"==true. > > > > Apologies for the delay getting back to this, I hope to find some more time > > to look at this again this week. > > No problem, please take your time on reviewing the series. > > > > > I guess what I am missing is why we care about a swap pte for a reclaimed page > > getting zapped. I thought that would imply the mapping was getting torn down, > > although I suppose in that case you still want the uffd-wp to apply in case a > > new mapping appears there? > > For the torn down case it'll always have ZAP_FLAG_DROP_FILE_UFFD_WP set, so > pte_install_uffd_wp_if_needed() won't be called, as zap_drop_file_uffd_wp() > will return true: Argh, thanks. I had forgotten that bit. > static inline void > zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, > unsigned long addr, pte_t *pte, > struct zap_details *details, pte_t pteval) > { > if (zap_drop_file_uffd_wp(details)) > return; > > pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); > } > > If you see it's non-trivial to fully digest all the caller stacks of it. What I > wanted to do with pte_install_uffd_wp_if_needed is simply to provide a helper > that can convert any form of uffd-wp ptes into a pte marker before being set as > none pte. Since uffd-wp can exist in two forms (either present, or swap), then > cover all these two forms (and for swap form also cover the uffd-wp special pte > itself) is very clear idea and easy to understand to me. I don't even need to > worry about who is calling it, and which case can be swap pte, which case must > not - we just call it when we want to persist the uffd-wp bit (after a pte got > cleared). That's why in all cases I still prefer to keep it as is, as it just > makes things straightforward to me. Ok, that makes sense. I don't think there is an actual problem here it was just a little surprising to me so I was trying to get a better understanding of the caller stacks and when this might actually be required. As you say though that is non-trivial and in any case it's still ok to install these bits and a single function is simpler. - Alistair > Thanks, > >