Re: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jun 23, 2022, at 4:49 PM, Peter Xu <peterx@xxxxxxxxxx> wrote:

> On Thu, Jun 23, 2022 at 11:35:00PM +0000, Nadav Amit wrote:
>> On Jun 23, 2022, at 4:24 PM, Peter Xu <peterx@xxxxxxxxxx> wrote:
>> 
>>> ⚠ External Email
>>> 
>>> On Wed, Jun 22, 2022 at 11:50:35AM -0700, Nadav Amit wrote:
>>>> From: Nadav Amit <namit@xxxxxxxxxx>
>>>> 
>>>> Using a PTE on x86 with cleared access-bit (aka young-bit)
>>>> takes ~600 cycles more than when the access bit is set. At the same
>>>> time, setting the access-bit for memory that is not used (e.g.,
>>>> prefetched) can introduce greater overheads, as the prefetched memory is
>>>> reclaimed later than it should be.
>>>> 
>>>> Userfaultfd currently does not set the access-bit (excluding the
>>>> huge-pages case). Arguably, it is best to let the user control whether
>>>> the access bit should be set or not. The expected use is to request
>>>> userfaultfd to set the access-bit when the copy/wp operation is done to
>>>> resolve a page-fault, and not to set the access-bit when the memory is
>>>> prefetched.
>>>> 
>>>> Introduce UFFDIO_[op]_ACCESS_LIKELY to enable userspace to request the
>>>> young bit to be set.
>>>> 
>>>> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>>>> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>> Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
>>>> Cc: Peter Xu <peterx@xxxxxxxxxx>
>>>> Cc: David Hildenbrand <david@xxxxxxxxxx>
>>>> Cc: Mike Rapoport <rppt@xxxxxxxxxxxxx>
>>>> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
>>> 
>>> Hmm.. is the hugetlb code overlooked (for both of the hints), or maybe I
>>> missed it? Do we need to cover them too? Thanks,
>> 
>> I do not know what the value of not setting the PTE’s access/dirty when it
>> comes to performance. The pages won’t be swapped out, just as you wrote in
>> your comment in hugetlb_mcopy_atomic_pte():
>> 
>>        /*
>>         * Always mark UFFDIO_COPY page dirty; note that this may not be
>>         * extremely important for hugetlbfs for now since swapping is not
>>         * supported, but we should still be clear in that this page cannot be
>>         * thrown away at will, even if write bit not set.
>>         */
>>        _dst_pte = huge_pte_mkdirty(_dst_pte);
>>        _dst_pte = pte_mkyoung(_dst_pte);
>> 
>> If you want for consistency/robustness not to set dirty on read-only
>> entries, that’s something that I can do.
> 
> We have more than one options here I guess.
> 
> We could have the hints not available for hugetlbfs, but then we'll both
> need to add special document for hugetlbfs (when you write the man-page,
> let's say) and also it'll be probably better to fail the ioctls with hints
> when applying to hugetlb vmas.
> 
> Leaving them silent to hugetlbfs is another option, it just sounds weird to
> me without any kind of documentation so I like this one least.
> 
> Or we could make them work for hugetlbfs too.  Not saying that swap will
> work some day (but it still could?!) it just seems all consistent as you
> said.
> 
> So I'd prefer the last one, but only if you agree.

My take is that hints are hints. Following David’s (or was it yours?)
feedback, I fixed the description to indicate that this is merely a hint and
removed all references to dirty/access bits. The kernel therefore can ignore
the hint when it wants to or use it in any other way. I fully agree that
this gives the kernel the ability to change the behavior as needed.

Note that for write-protected 4KB zero-page (where we share the zero-page)
we always set the access-bit, regardless of the hint, because it makes
sense: the zero-page is not swappable and therefore the access-bit is set.

I think that the lesser user-facing documentation there is on how the
feature is *exactly* used by the kernel - is better from an API point of
view.

So I see no reason to fail or be forced not to set a page as young, just
because a hint was *not* provided. This would even be a regression in the
behavior. The hint is actually always respected right now, it is just that
even if you do not provide the hint, the access/dirty is set.

The only consistency I think worth thinking about is with the dirty-bit, and
I can add it if you want. Note that the access-bit (in x86) might be set
speculatively in contrast to the dirty-bit is only set atomically with a
real access. That’s the reason I think it may make sense not to set the
dirty without a hint.

Is that acceptable? Access-bit always set, dirty-bit according to hint?





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux