Re: [PATCH v3 2/2] s390/mm: re-enable the shared zeropage for !PV and !skeys KVM guests

Christian Borntraeger <borntraeger@xxxxxxxxxxxxx> · Thu, 18 Apr 2024 15:09:20 +0200

Am 16.04.24 um 15:41 schrieb David Hildenbrand:
On 16.04.24 14:02, Christian Borntraeger wrote:

Am 16.04.24 um 08:37 schrieb Alexander Gordeev:

We could piggy-back on vm_fault_to_errno(). We could use
vm_fault_to_errno(rc, FOLL_HWPOISON), and only continue (retry) if the rc is 0 or
-EFAULT, otherwise fail with the returned error.

But I'd do that as a follow up, and also use it in break_ksm() in the same fashion.

@Christian, do you agree with this suggestion?

I would need to look into that more closely to give a proper answer. In general I am ok
with this but I prefer to have more eyes on that.
  From what I can tell we should cover all the normal cases with our CI as soon as it hits
next. But maybe we should try to create/change a selftest to trigger these error cases?

If we find a shared zeropage we expect the next unsharing fault to succeed except:

(1) OOM, in which case we translate to -ENOMEM.

(2) Some obscure race with MADV_DONTNEED paired with concurrent truncate(), in which case we get an error, but if we look again, we will find the shared zeropage no longer mapped. (this is what break_ksm() describes)

(3) MCE while copying the page, which doesn't quite apply here.

For the time being, we only get shared zeropages in (a) anon mappings (b) MAP_PRIVATE shmem mappings via UFFDIO_ZEROPAGE. So (2) is hard or even impossible to trigger. (1) is hard to test as well, and (3) ...

No easy way to extend selftests that I can see.

Yes, lets just go forward.

If we repeatedly find a shared zeropage in a COW mapping and get an error from the unsharing fault, something else would be deeply flawed. So I'm not really worried about that, but I agree that having a more centralized check will make sense.