Re: [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 03, 2020 at 09:59:50PM -0800, Hugh Dickins wrote:
> On Thu, 3 Dec 2020, Andrea Arcangeli wrote:
> > On Thu, Dec 03, 2020 at 09:30:51PM -0500, Peter Xu wrote:
> > > I'm just afraid there's no space left for a migration entry, because migration
> > > entries fills in the pfn information into swp offset field rather than a real
> > > offset (please refer to make_migration_entry())?  I assume PFN can use any bit.
> > > Or did I miss anything?
> > > 
> > > I went back to see the original proposal from Hugh:
> > > 
> > >   IIUC you only need a single value, no need to carve out another whole
> > >   swp_type: could probably be swp_offset 0 of any swp_type other than 0.
> > > 
> > > Hugh/Andrea, sorry if this is a stupid swap question: could you help explain
> > > why swp_offset=0 won't be used by any swap device?  I believe it's correct,
> > > it's just that I failed to figure out the reason myself. :(
> > > 
> 
> It's because swp_offset 0 is the offset of the swap header, and if we
> ever used that when allocating swap, then the swap header would get
> overwritten, and that swap area become unrecognizable next time.
> 
> But I said it would be usable for UFFD with any swp_type other than 0,
> because a swap entry of type 0, offset 0 is simply 0, which looks just
> like no swap entry at all, and there are (or were: I might not be
> up-to-date) benign races where a swap entry might get passed down but
> then found to be 0, and that was understandable and permitted (yes,
> I still see the "if (!entry.val) goto out;" in __swap_info_get()).
> 
> And that might be related to pte_none() being 0 on most architectures
> (not s390 IIRC): we need to distinguish none from swap.  Though that
> all gets complicated by the way the swp_entry is munged before being
> put into a pte, and the x86 swap munging got more complicated when
> L1TF was revealed (and accompanied by prot none munging too) -
> search git log of v4.19 for x86/speculation/l1tf if you need to.

My thanks to both of you for explaining the details.

> 
> > 
> > Hugh may want to review if I got it wrong, but there's basically three
> > ways.
> > 
> > swp_type would mean adding one more reserved value in addition of
> > SWP_MIGRATION_READ and SWP_MIGRATION_WRITE (kind of increasing
> > SWP_MIGRATION_NUM to 3).
> 
> I'm not very keen on actually using any of the SWP_MIGRATION defines,
> partly because in principle UFFD should not depend on CONFIG_MIGRATION,
> partly because the uffd_wp entry would not behave anything like a
> migration entry (whose pfn should always indicate a locked page).
> 
> swp_offset 0 of swp_type 1 perhaps?
> 
> > 
> > swp_offset = 0 works in combination of SWP_MIGRATION_WRITE and
> > SWP_MIGRATION_READ if we enforce pfn 0 is never used by the kernel
> > (I'd feel safer with pfn value -1UL truncated to the bits of the swp
> > offset, since the swp_entry format is common code).
> > 
> > The bit I was suggesting is just one more bit like _PAGE_SWP_UFFD_WP
> > from the pte, one that cannot ever be set in any swp entry today. I
> > assume it can't be _PAGE_SWP_UFFD_WP since that already can be set but
> > you may want to verify it...
> 
> I don't see why you would need another bit for this.
> 
> The code that checks non-present non-none entries in page table,
> for whether they are actually swap or migration entries or whatever,
> would now also check for swp_offset 0 of swp_type 1 and go off to
> the UFFD WP processing if so.
> 
> I didn't pay much attention to below, it seemed over-complicated.
> And I don't think Peter's PROT_NONE alternative was unworkable,
> but would have to be more careful about pfn and L1TF than shown.
> And I am more comfortable to focus on the swap-like direction,
> than think in two directions at once - never my strength!

Yes, I think both of them may work, but I'll follow your advise on using swap
entries, assuming easier and cleaner than _PAGE_PROTNONE.  For example, current
pte_present() does make more sense to return false for such an uffd-wp reserved
pte.  Then I won't make _PAGE_PROTNONE even more complicated too.

So I guess I'll start with type==1 && offset==0.

(PS: I still think "swp_entry(0, _UFFD_SWP_UFFD_WP) && !vma_is_anonymous(vma)"
 could also be a good candidate comparing to "swp_entry(1, 0)" considering
 type==1 here is kind of randomly chosen from all the other numbers except 0;
 but maybe that's not extremely important - the major logic should be the same)

Thanks!

-- 
Peter Xu





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux