On Sat, Jan 9, 2021 at 6:51 PM Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote: > > I just don't see the simplification coming from > 09854ba94c6aad7886996bfbee2530b3d8a7f4f4. Instead of checking > page_mapcount above as an optimization, to me it looks much simpler to > check it in a single place, in do_wp_page, that in addition of > optimizing away the superfluous copy, would optimize away the above > complexity as well. Here's the difference: (a) in COW, "page_mapcount()" is pure and utter garbage, and has zero meaning. Why? Because MAPCOUNT DOES NOT MATTER FOR COW. COW is about "I'm about to write to this page, and that means I need an _exclusive_ page so that I don't write to a page that somebody else is using". Can you admit that fundamental fact? Notice how "page_mapcount()" has absolutely NOTHING to do with "exclusive page". There are lots of other ways the page can be used aside from mapcount. The page cache may have a reference to the page. Somebody that did a GUP may have a reference to the page. So what actually matters at COW time? The only thing that matters is "am I the exclusive owner". And guess what? We have a count of page references. It's "page_count()". That's *EXACTLY* the thing that says "are there maybe other references to this page". In other words, COW needs to use page_count(). It really is that easy. End of story. So, given that, why do I then do > + if (page_mapcount(page) != 1) > + return false; in my patch, when I just told you that "page_mapcount()" is irrelevant for COW? Guess what? The above isn't about COW. The above isn't about whether we need to do a copy in order to be able to write to the page without anybody else being affected by it. No, at fork time, and at this clear_refs time, the question is entirely different. The question is not "Do I have exclusive access to the page", but it is "Did I _already_ made sure that I have exclusive access to the page because I pinned it"? See how different the question is? Because *if* you have done a pinned COW for soem direct-IO read, and *if* that page is dirty, then you know it's mapped only in your address space. You're basically doing the _reverse_ of the COW test, and asking yourself "is this my own private pinned page"? And then it's actually perfectly sane to do a check that says "obviously, if somebody else has this page mapped, then that's not the case". See? For COW, "page_mapcount()" is pure and utter garbage, and entirely meaningless. How many places it's mapped in doesn't matter. You may have to COW even if it's only mapped in your address space (page cache, GUP, whatever). But for "did I already make this exclusive", then it's actually meaningful to say "is it mapped somewhere else". We know it has other users - that "page_may_be_pinned()" in fact *guarantees* that it has other users. But we're double-checking that the other users aren't other mappings. That said, I did just realize that that "page_mapcount()" check is actually pointless. Because we do have a simpler one. Instead of checking whether all those references that made us go "page_might_be_pinned()" aren't other mappings, the simple check for "pte_writable()" would already have told us that we had already done the COW. So you are actually right that the page_mapcount() test in my patch is not the best way to check for this. By the time we see "page_may_be_pinned()", we might as well just say "Oh, it's a private mapping and the pte is already writable, so we know we were the exclusive mapper, because COW and fork() already guarantee that". > And I won't comment if it's actually safe to skip random pages or > not. All I know is for mprotect and uffd-wp, definitely the above > approach wouldn't work. Why do you say that? You say ":definitely know", but I think you're full of it. The fact is, if you have a pinned page, why wouldn't we say "you can't turn it read-only"? It's pinned in the VM address space - and it's pinned writable. Simple and clear semantics. You can *remove* it, but you can't change the pinning. Linus