On Sat, Nov 18, 2023 at 3:37 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Fri, Nov 17, 2023 at 07:47:00AM +0800, Barry Song wrote: > > This has been discussed. Steven, Ryan and I all don't think this is a good > > option. in case we have a large folio with 16 basepages, as do_swap_page > > can only map one base page for each page fault, that means we have > > to restore 16(tags we restore in each page fault) * 16(the times of page faults) > > for this large folio. > > That doesn't seem all that hard to fix? Call set_ptes() instead of > set_pte_at(). The biggest thing, I guess, is making sure that all > the PTEs you're going to set up are still pte_none(). I guess you mean all are still swap entries in ptes. some risks I can see 1. vma might be splitted after folios added into swapcache, for example unmap or mprotect a part of large folios from userspace 2. vma is not splitted, but some basepages are MADV_DONTNEED within the folios. 3. basepages in the large folio might become having different permissions on R/W/X. for example, if a large folio has 16 basepages, as userspace is still working at 4kb, userspace can mprotect RD_ONLY for a part of them, in this case, 16PTEs will still be swap entries, but the re-use for write fault can't work at folio granularity. I need to consider all the above DoubleMap/split risks rather than simply checking PTEs as userspace is still 4KB. > Thanks Barry