Re: [PATCH 0/1] mm: restore full accuracy in COW page reuse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 9, 2021 at 6:51 PM Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:
>
> I just don't see the simplification coming from
> 09854ba94c6aad7886996bfbee2530b3d8a7f4f4. Instead of checking
> page_mapcount above as an optimization, to me it looks much simpler to
> check it in a single place, in do_wp_page, that in addition of
> optimizing away the superfluous copy, would optimize away the above
> complexity as well.

Here's the difference:

 (a) in COW, "page_mapcount()" is pure and utter garbage, and has zero meaning.

Why? Because MAPCOUNT DOES NOT MATTER FOR COW.

COW is about "I'm about to write to this page, and that means I need
an _exclusive_ page so that I don't write to a page that somebody else
is using".

Can you admit that fundamental fact?

Notice how "page_mapcount()" has absolutely NOTHING to do with
"exclusive page". There are lots of other ways the page can be used
aside from mapcount. The page cache may have a reference to the page.
Somebody that did a GUP may have a reference to the page.

So what actually matters at COW time? The only thing that matters is
"am I the exclusive owner". And guess what? We have a count of page
references. It's "page_count()". That's *EXACTLY* the thing that says
"are there maybe other references to this page".

In other words, COW needs to use page_count(). It really is that easy.
End of story.

So, given that, why do I then do

> +     if (page_mapcount(page) != 1)
> +             return false;

in my patch, when I just told you that "page_mapcount()" is irrelevant for COW?

Guess what? The above isn't about COW. The above isn't about whether
we need to do a copy in order to be able to write to the page without
anybody else being affected by it.

No, at fork time, and at this clear_refs time, the question is
entirely different. The question is not "Do I have exclusive access to
the page", but it is "Did I _already_ made sure that I have exclusive
access to the page because I pinned it"?

See how different the question is?

Because *if* you have done a pinned COW for soem direct-IO read, and
*if* that page is dirty, then you know it's mapped only in your
address space. You're basically doing the _reverse_ of the COW test,
and asking yourself "is this my own private pinned page"?

And then it's actually perfectly sane to do a check that says
"obviously, if somebody else has this page mapped, then that's not the
case".

See?

For COW, "page_mapcount()" is pure and utter garbage, and entirely
meaningless. How many places it's mapped in doesn't matter. You may
have to COW even if it's only mapped in your address space (page
cache, GUP, whatever).

But for "did I already make this exclusive", then it's actually
meaningful to say "is it mapped somewhere else". We know it has other
users - that "page_may_be_pinned()" in fact *guarantees* that it has
other users. But we're double-checking that the other users aren't
other mappings.

That said, I did just realize that that "page_mapcount()" check is
actually pointless.

Because we do have a simpler one. Instead of checking whether all
those references that made us go "page_might_be_pinned()" aren't other
mappings, the simple check for "pte_writable()" would already have
told us that we had already done the COW.

So you are actually right that the page_mapcount() test in my patch is
not the best way to check for this. By the time we see
"page_may_be_pinned()", we might as well just say "Oh, it's a private
mapping and the pte is already writable, so we know we were the
exclusive mapper, because COW and fork() already guarantee that".

> And I won't comment if it's actually safe to skip random pages or
> not. All I know is for mprotect and uffd-wp, definitely the above
> approach wouldn't work.

Why do you say that?

You say ":definitely know", but I think you're full of it.

The fact is, if you have a pinned page, why wouldn't we say "you can't
turn it read-only"?

It's pinned in the VM address space - and it's pinned writable. Simple
and clear semantics. You can *remove* it, but you can't change the
pinning.

              Linus




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux