On Thu, Sep 28, 2023 at 11:34 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Thu, Sep 28, 2023 at 07:51:18PM +0200, David Hildenbrand wrote: > > On 28.09.23 19:21, Peter Xu wrote: > > > On Thu, Sep 28, 2023 at 07:05:40PM +0200, David Hildenbrand wrote: > > > > As described as reply to v1, without fork() and KSM, the PAE bit should > > > > stick around. If that's not the case, we should investigate why. > > > > > > > > If we ever support the post-fork case (which the comment above remap_pages() > > > > excludes) we'll need good motivation why we'd want to make this > > > > overly-complicated feature even more complicated. > > > > > > The problem is DONTFORK is only a suggestion, but not yet restricted. If > > > someone reaches on top of some !PAE page on src it'll never gonna proceed > > > and keep failing, iiuc. > > > > Yes. It won't work if you fork() and not use DONTFORK on the src VMA. We > > should document that as a limitation. > > > > For example, you could return an error to the user that can just call > > UFFDIO_COPY. (or to the UFFDIO_COPY from inside uffd code, but that's > > probably ugly as well). > > We could indeed provide some special errno perhaps upon the PAE check, then > document it explicitly in the man page and suggest resolutions (like > DONTFORK) when user hit it. > > > > > > > > > do_wp_page() doesn't have that issue of accuracy only because one round of > > > CoW will just allocate a new page with PAE set guaranteed, which is pretty > > > much self-heal and unnoticed. > > > > Yes. But it might have to copy, at which point the whole optimization of > > remap is gone :) > > Right, but that's fine IMHO because it should still be very corner case, > definitely not expected to be the majority to start impact the performance > results. > > > > > > > > > So it'll be great if we can have similar self-heal way for PAE. If not, I > > > think it's still fine we just always fail on !PAE src pages, but then maybe > > > we should let the user know what's wrong, e.g., the user can just forgot to > > > apply DONTFORK then forked. And then the user hits error and don't know > > > what happened. Probably at least we should document it well in man pages. > > > > > Yes, exactly. > > > > > Another option can be we keep using folio_mapcount() for pte, and another > > > helper (perhaps: _nr_pages_mapped==COMPOUND_MAPPED && _entire_mapcount==1) > > > for thp. But I know that's not ideal either. > > > > As long as we only set the pte writable if PAE is set, we're good from a CVE > > perspective. The other part is just simplicity of avoiding all these > > mapcount+swapcount games where possible. > > > > (one day folio_mapcount() might be faster -- I'm still working on that patch > > in the bigger picture of handling PTE-mapped THP better) > > Sure. > > For now as long as we're crystal clear on the possibility of inaccuracy of > PAE, it never hits besides fork() && !DONTFORK, and properly document it, > then sounds good here. Ok, sounds like we have a consensus. I'll prepare manpage changes to document the DONTFORK requirement for uffd_remap. > > Thanks, > > -- > Peter Xu >