Re: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 24, 2022 at 05:58:17PM -0400, Peter Xu wrote:
> [Sorry for replying late]
> 
> On Fri, Jun 24, 2022 at 02:42:21AM +0000, Nadav Amit wrote:
> > 
> > 
> > > On Jun 23, 2022, at 7:05 PM, Peter Xu <peterx@xxxxxxxxxx> wrote:
> > > 
> > > On Fri, Jun 24, 2022 at 12:03:38AM +0000, Nadav Amit wrote:
> > >> My take is that hints are hints. Following David’s (or was it yours?)
> > >> feedback, I fixed the description to indicate that this is merely a hint and
> > >> removed all references to dirty/access bits. The kernel therefore can ignore
> > >> the hint when it wants to or use it in any other way. I fully agree that
> > >> this gives the kernel the ability to change the behavior as needed.
> > >> 
> > >> Note that for write-protected 4KB zero-page (where we share the zero-page)
> > >> we always set the access-bit, regardless of the hint, because it makes
> > >> sense: the zero-page is not swappable and therefore the access-bit is set.
> > > 
> > > The zero-page example makes sense, and yeah that makes the hugetlb behavior
> > > making more sense too.
> > > 
> > >> 
> > >> I think that the lesser user-facing documentation there is on how the
> > >> feature is *exactly* used by the kernel - is better from an API point of
> > >> view.
> > >> 
> > >> So I see no reason to fail or be forced not to set a page as young, just
> > >> because a hint was *not* provided. This would even be a regression in the
> > >> behavior. The hint is actually always respected right now, it is just that
> > >> even if you do not provide the hint, the access/dirty is set.
> > >> 
> > >> The only consistency I think worth thinking about is with the dirty-bit, and
> > >> I can add it if you want. Note that the access-bit (in x86) might be set
> > >> speculatively in contrast to the dirty-bit is only set atomically with a
> > >> real access. That’s the reason I think it may make sense not to set the
> > >> dirty without a hint.
> > > 
> > > Sorry to ask if this is (another) naive question: any link/help to explain
> > > the speculative behavior on access bit? Is it part of speculative
> > > execution (which, iiuc, would it be reverted if the speculation failed)?
> > 
> > Oh man, it is hard to find a reference. I made this claim it based on my
> > recollection (and logic).
> > 
> > The access-bit on Intel is set when the PTE is loaded into the TLB, so if you
> > allow speculative loading of the TLB, that’s what you get.
> > 
> > Googling shows Yu Zhao saying: "IIRC, there are also false positives, i.e.,
> > the accessed bit is set on entries used by speculative execution only.” [1]
> > 
> > Intel SDM says: "Whenever the processor uses a paging-structure entry as part
> > of linear-address translation, it sets the accessed flag in that entry...
> > Whenever there is a write to a linear address, the processor sets the dirty
> > flag (if it is not already set) in the paging- structure entry..."
> > 
> > You can argue that this indicates that the access-bit is updated
> > speculatively (translations can be speculative) and dirty-bit is on actual
> > write. But it is somewhat of a creative reading.
> > 
> > Googling further did not help much, but I found a relevant discussion on
> > RISC-V, in which they actually consider a similar behavior. [2]
> > 
> > If you want (and care), we can cc Dave Hansen to get a clear answer.
> > 
> > [1] https://lore.kernel.org/lkml/YE7Rk%2FYA1Uj7yFn2@xxxxxxxxxx/
> > [2] https://lists.riscv.org/g/tech-virt-mem/topic/accessed_bit/77699883?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,1,80,77699883
> 
> I thought even writes can be speculatively executed too?  Though I think
> when the speculation was proved wrong the write needs to be reverted along
> with making sure D bit cleared if it was cleared before the speculative
> operation.
> 
> So I think I get you if you meant the access bit may not be reverted even
> if we hit a speculative failure (though without solid proofs, afaict..).
> IOW we could have false positive access bits set even if not accessed, but
> not to D bits which should be accurate.
> 
> > 
> > > 
> > >> 
> > >> Is that acceptable? Access-bit always set, dirty-bit according to hint?
> > > 
> > > I'm still trying to digest what you said above, sorry.
> > > 
> > > Aren't both access and dirty bits need an atomic op to be set anyway? Then
> > > from perf pov should we simply keep setting them both too like what you did
> > > with this version? because it seems that'll always avoid an extra pgtable
> > > update access?
> > 
> > I guess by atomic-op you mean atomic-update by the hardware AD-assist.
> 
> Yes.
> 
> Btw, since I looked at the SDM as you quoted I think that may not strictly
> be like an atomic op from processor pov, I guess, since there's a NOTE:
> 
>   The accesses used by the processor to set these flags may or may not be
>   exposed to the processor’s self-modifying code detection logic. If the
>   processor is executing code from the same memory area that is being used
>   for the paging structures, the setting of these flags may or may not
>   result in an immediate change to the executing code stream.
> 
> So I read it as: even if it'll be an atomic, the op can be postponed.
> 
> > 
> > I agree that if a page is written, the bits would need to be updated and
> > these would introduce an overhead. However, if the page cannot be written,
> > well, the dirty bit would never be set.
> 
> Ok I see what you mean now.  But honestly, I don't think it's anything
> related to the speculative access bit behavior described above.. or is it?
> 
> > 
> > hugetlb_mcopy_atomic_pte() currently does the following:
> > 
> >         _dst_pte = huge_pte_mkdirty(_dst_pte);
> >         _dst_pte = pte_mkyoung(_dst_pte);
> > 
> >         if (wp_copy)
> >                 _dst_pte = huge_pte_mkuffd_wp(_dst_pte);
> > 
> > Since you asked to update hugetlb_mcopy_atomic_pte(), I can offer three
> > options:
> > 
> > 1. Do not set dirty if (wp_copy).
> > 2. Do not set dirty if (wp_copy || !write_hint) 
> > 3. Keep it as is.
> 
> AFAICT you already go somewhere at least not (3) with non-hugetlb pages in
> current series.. because dirty bit is not always set already for them, so
> I'd say we'd make them match?  Hugetlbfs shouldn't be special in this
> aspect, IMHO.
> 
> Said that, I think it doesn't really necessary need to be that complex,
> since make_huge_pte() already sets dirty bit when "writable=1", so IIUC
> what you need to do is simply make sure dirty bit set when write_hint=1.
> 
> Does it sounds correct to you?

Hmm, hold on...  I failed to figure out how that write-likely hint could
help us for either huge or non-huge pages, since:

  (1) Old code always set dirty, so no perf degrade anyway with/without the
      hint

  (2) If we want to rework dirty bit (which I'm totally fine with..), then
      we don't apply it when we shouldn't, and afaict we should set D bit
      whenever we should...  if the user assumes this page is likely to be
      written but made it read-only, say, with UFFDIO_COPY(wp_mode=1),
      setting D bit will not help, instead, the user should simply use an
      UFFDIO_COPY(wp_mode=0) then the dirty will be set with write=1..

It'll be helpful but only helpful for UFFDIO_ZEROCOPY because it avoids one
COW.  But that seems to be it.

In short: I'm wondering whether we only really need the ACCESS_LIKELY hint
as you proposed earlier.  We may want UFFDIO_ZEROPAGE_MODE_ALLOCATE
separately, but keep that only for zeropage op (and it shouldn't really be
called WRITE_LIKELY)?  Or did I miss something?

-- 
Peter Xu





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux