On Fri, Sep 25, 2020 at 2:13 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Fri, Sep 25, 2020 at 12:56:05PM -0700, Linus Torvalds wrote: > > So I think we can simply add a > > > > if (page_mapcount(page) != 1) > > return false; > > > > to page_maybe_dma_pinned(), and that very naturally protects against > > the "is the page count perhaps elevated due to a lot of forking?" > > How about the MAP_SHARED case where the page is pinned by some process but also > shared (so mapcount can be >1)? MAP_SHARED doesn't matter, since it's not getting COW'ed anyway, and we keep the page around regardless. So MAP_SHARED is the easy case. We'll never get to any of this code, because is_cow_mapping() won't be true. You can still screw up MAP_SHARED if you do a truncate() on the underlying file or something like that, but that most *definitely* falls under the "you only have yourself to blame" heading. > Would the ATOMIC version always work? I mean, I thought it could fail anytime, > so any fork() can start to fail for the tests too. Sure. I'm not really happy about GFP_ATOMIC, but I suspect it works in practice. Honestly, if somebody first pins megabytes of memory, and then does a fork(), they are doing some seriously odd and wrong things. So I think this should be a "we will try to handle it gracefully, but your load is broken" case. I am still inclined to add some kind of warning to this case, but I'm also a bit on the fence wrt the whole "convenience" issue - for some very occasional use it's probably convenient to not have to worry about this in user space. Actually, what I'm even less happy about than the GFP_ATOMIC is how much annoying boilerplate this just "map anonymous page" required with the whole cgroup_charge, throttle, anon_rmap, lru_cache_add thing. Looking at that patch, it all looks _fairly_ simple, but there's a lot of details that got duplicated from the pte_none() new-page-fault case (and that the do_cow_page() case also shares) I understand why it happens, and there's not *that* many cases, it made me go "ouch, this is a lot of small details, maybe I missed some", and I got the feeling that I should try to re-organize a few helper functions to avoid duplicating the same basic code over and over again. But I decided that I wanted to keep the patch minimal and as focused as possible, so I didn't actually do that. But we clearly have decades of adding rules that just makes even something as "simple" as "add a new page to a VM" fairly complex. Also, to avoid making the patch bigger, I skipped your "pass destination vma around" patch. I think it's the right thing conceptually, but everything I looked at also screamed "we don't actually care about the differences" to me. I dunno. I'm conflicted. This really _feels_ to me like "we're so close to just fixing this once and for all", but then I also go "maybe we should just revert everything and do this for 5.10". Except "reverting everything" is sadly really really problematic too. It will fix the rdma issue, but in this case "everything" goes all the way back to "uhhuh, we have a security issue with COW going the wrong way". Otherwise I'd have gone that way two weeks ago already. Linus