Hello Boris,
On 10/31/2022 4:15 PM, Borislav Petkov wrote:
On Mon, Oct 31, 2022 at 03:10:16PM -0500, Kalra, Ashish wrote:
Just to add here, writing to any of these pages from the Host
will trigger a RMP #PF which will cause the RMP page fault handler
to send a SIGBUS to the current process, as this page is not owned
by Host.
And kill the host process?
So this is another "policy" which sounds iffy. If we kill the process,
we should at least say why. Are we doing that currently?
Yes, pasted below is the latest host RMP #PF handler, with new and
additional comments added and there is a relevant comment added here for
this behavior:
static int handle_user_rmp_page_fault(struct pt_regs *regs, unsigned
long error_code,unsigned long address)
{
...
...
/*
* If its a guest private page, then the fault cannot be resolved.
* Send a SIGBUS to terminate the process.
*
* As documented in APM vol3 pseudo-code for RMPUPDATE, when the
* 2M range is covered by a valid (Assigned=1) 2M entry, the middle
* 511 4k entries also have Assigned=1. This means that if there is
* an access to a page which happens to lie within an Assigned 2M
* entry, the 4k RMP entry will also have Assigned=1. Therefore, the
* kernel should see that the page is not a valid page and the fault
* cannot be resolved.
*/
if (snp_lookup_rmpentry(pfn, &rmp_level)) {
do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
return RMP_PF_RETRY;
}
...
...
I believe that we already had an off-list discussion on the same,
copying David Kaplan's reply on the same below:
So what I think you want to do is:
1. Compute the pfn for the 4kb page you're trying to access (as your
code below does) 2. Read that RMP entry -- If it is assigned then kill
the process 3. Otherwise, check the level from the host page table. If
level=PG_LEVEL_4K then somebody else may have already smashed this page,
so just retry the instruction 4. If level=PG_LEVEL_2M/1G, then the host
needs to split their page.
This is the current algorithm being followed by the host RMP #PF handler.
So calling memory_failure() is proactively doing the same, marking the
page as poisoned and probably also killing the current process.
But the page is not suffering a memory failure - it cannot be reclaimed
for whatever reason. Btw, how can that reclaim failure ever happen? Any
real scenarios?
The scenarios here are either SNP FW failure (SNP_PAGE_RECLAIM command)
in transitioning the page back to HV state and/or RMPUPDATE instruction
failure to transition the page back to hypervisor/shared state.
Anyway, memory failure just happens to fit what you wanna do but you
can't just reuse that - that's hacky. What is the problem with writing
your own function which does that?
Ok.
Will look at adding our own recovery function for the same, but that
will again mark the pages as poisoned, right ?
Still waiting for some/more feedback from mm folks on the same.
Thanks,
Ashish
Also, btw, please do not top-post.
Thx.