Re: [PATCH] Documentation: update pagemap with SOFT_DIRTY & UFFD_WP shmem issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Tiberiu,

On Fri, Aug 20, 2021 at 05:10:20PM +0000, Tiberiu Georgescu wrote:
> Currently, the missing information for shmem is this:
> 1. Difference between is_swap(pte) and is_none(pte).
>     * is_swap(pte) is always false;
>     * is_none(pte) is true when is_swap() should have been;
>     * is_present(pte) is fine.
> 2. swp_entry(pte)
>     Particularly, swp_type() and swp_offset().
> 3. SOFT_DIRTY_BIT
>     This is not always missing for shmem. 
>     Once 4 is written to clear_refs, if the page is dirtied, the bit is fine as long as it
>     is still in memory. If the page is swapped out, the bit is lost. Then, if the page is
>     brought back into memory, the bit is still lost.
> 
> For 1, you mentioned how lseek() and madvise() can be used to get this
> information [2], and I proposed a different method with a little help from
> the current pagemap[3]. They have slightly different output and applications, so
> the difference should be taken into consideration.
> For 2, if anyone knows of any way of retrieve the missing information cleanly,
> please let us know. 
> As for 3, AFAIK, we will need to leverage Peter's special PTE marker mechanism
> and implement it in another patch.
> 
> [2]: https://lore.kernel.org/lkml/5766d353-6ff8-fdfa-f8f9-764e8de9b5aa@xxxxxxxxxx/
> [3]: https://lore.kernel.org/lkml/B130B700-B3DB-4D07-A632-73030BCBC715@xxxxxxxxxxx/
> 
> ============================
> For completeness, I would like to mention Peter's RFC[4] and my own patch[5],
> which deal with adding missing functionality to the pagemap when pages are
> shmem/tmpfs.
> 
> Peter's patch[4] adds the missing information at 1 to the pagemap, with very little performance overhead. AFAIK, it is still WIP.
> 
> My patch[5] fixes both 1 and 2, at the expense of a significant loss in performance
> when dealing with swapped out shared pages. This performance loss can be
> reduced with batching, for use cases when high performance matters. Also, this
> patch on top of Peter's RFC yields better performance[6]. Still 2x as slow on
> average compared to pre-patch.
> 
> Peter's patch has a config flag, and I intend to add one to mine in the next
> version. So I wanted to propose, if alternatives are not implemented yet (mincore,
> lseek, map_files or otherwise are insufficient), we upstream our patches (once
> they are ready), so that users can toggle them on or off, depending on whether
> they need the extra functionality or not. And, of course, document their usage.
> 
> If neither sounds like a particularly useful/convenient option, we might need to
> look into designs of retrieving the missing information via another mechanism
> (sys/fs, ioctl, netlink etc).
> 
> That is, unless we find that we can/should place this info in the pagemap still, for
> the sake of correctness and completeness. For that though, we should convene
> on what do we expect the pagemap to do in the end. Is shmem/tmpfs out of
> bounds for it or not?
> 
> [4]: https://lore.kernel.org/lkml/20210807032521.7591-1-peterx@xxxxxxxxxx/
> [5]: https://lore.kernel.org/lkml/20210730160826.63785-1-tiberiu.georgescu@xxxxxxxxxxx/
> [6]: https://lore.kernel.org/lkml/C0DB3FED-F779-4838-9697-D05BE96C3514@xxxxxxxxxxx/

Thanks for summarizing the issues.

Before going further, I really would like to understand a few questions that I
already raised in the other thread here:

https://lore.kernel.org/lkml/YR%2F+gfL8RCP8XoB1@t490s/

They're:

  (1) Whether does mincore() suit your need already?

  (2) What would you like to do with swap entries in pagemap?

I'm more interested in question (2) because I never figured it out before, and
I really don't see how it would work even if the kernel can share swap format
to userspace.  E.g., right after you decided to "zero copy" that page, the page
can be faulted in right before live migration finishes, and it can be dirtied
again.  Then the page on the shared network storage will be stall, the same to
the swap entry you just scanned.

Thanks,

-- 
Peter Xu




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux