Hi Dave,
On 7/8/21 11:16 AM, Dave Hansen wrote:
"SIGBUG"?
Its typo, it should be SIGBUS
+
+ if (unlikely(!cpu_feature_enabled(X86_FEATURE_SEV_SNP)))
+ return RMP_FAULT_KILL;
Shouldn't this be a WARN_ON_ONCE()? How can we get RMP faults without
SEV-SNP?
Yes, we should *not* get RMP fault if SEV-SNP is not enabled. I can use
the WARN_ON_ONCE().
+ /* Get the native page level */
+ pte = lookup_address_in_mm(current->mm, address, &level);
+ if (unlikely(!pte))
+ return RMP_FAULT_KILL;
What would this mean? There was an RMP fault on a non-present page?
How could that happen? What if there was a race between an unmapping
event and the RMP fault delivery?
We should not have RMP fault for non-present pages. But you have a good
point that there maybe a race between the unmap event and RMP fault.
Instead of terminating the process we should simply retry.
+ pfn = pte_pfn(*pte);
+ if (level > PG_LEVEL_4K) {
+ mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
+ pfn |= (address >> PAGE_SHIFT) & mask;
+ }
This looks inherently racy. What happens if there are two parallel RMP
faults on the same 2M page. One of them splits the page tables, the
other gets a fault for an already-split page table.
> Is that handled here somehow?
Yes, in this particular case we simply retry and hardware should
re-evaluate the page level and take the corrective action.
+ /* Get the page level from the RMP entry. */
+ e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
+ if (!e)
+ return RMP_FAULT_KILL;
The snp_lookup_page_in_rmptable() failure cases looks WARN-worthly.
Either you're doing a lookup for something not *IN* the RMP table, or
you don't support SEV-SNP, in which case you shouldn't be in this code
in the first place.
Noted.
+ /*
+ * Check if the RMP violation is due to the guest private page access.
+ * We can not resolve this RMP fault, ask to kill the guest.
+ */
+ if (rmpentry_assigned(e))
+ return RMP_FAULT_KILL;
No "We's", please. Speak in imperative voice.
Noted.
+ /*
+ * The backing page level is higher than the RMP page level, request
+ * to split the page.
+ */
+ if (level > rmp_level)
+ return RMP_FAULT_PAGE_SPLIT;
This can theoretically trigger on a hugetlbfs page. Right?
Yes, theoretically.
In the current implementation, the VMM is enlightened to not use the
hugetlbfs for backing page when creating the SEV-SNP guests.
I thought I asked about this before... more below...
+ return RMP_FAULT_RETRY;
+}
+
/*
* Handle faults in the user portion of the address space. Nothing in here
* should check X86_PF_USER without a specific justification: for almost
@@ -1298,6 +1350,7 @@ void do_user_addr_fault(struct pt_regs *regs,
struct task_struct *tsk;
struct mm_struct *mm;
vm_fault_t fault;
+ int ret;
unsigned int flags = FAULT_FLAG_DEFAULT;
tsk = current;
@@ -1378,6 +1431,22 @@ void
(struct pt_regs *regs,
if (error_code & X86_PF_INSTR)
flags |= FAULT_FLAG_INSTRUCTION;
+ /*
+ * If its an RMP violation, try resolving it.
+ */
+ if (error_code & X86_PF_RMP) {
+ ret = handle_user_rmp_page_fault(error_code, address);
+ if (ret == RMP_FAULT_PAGE_SPLIT) {
+ flags |= FAULT_FLAG_PAGE_SPLIT;
+ } else if (ret == RMP_FAULT_KILL) {
+ fault |= VM_FAULT_SIGBUS;
+ do_sigbus(regs, error_code, address, fault);
+ return;
+ } else {
+ return;
+ }
+ }
Why not just have handle_user_rmp_page_fault() return a VM_FAULT_* code
directly?
I don't have any strong reason against it. In next rev, I can update to
use the VM_FAULT_* code and call the do_sigbus() etc.
I also suspect you can just set VM_FAULT_SIGBUS and let the do_sigbus()
call later on in the function do its work.
+static int handle_split_page_fault(struct vm_fault *vmf)
+{
+ if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
+ return VM_FAULT_SIGBUS;
+
+ __split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
+ return 0;
+}
What will this do when you hand it a hugetlbfs page?
VMM is updated to not use the hugetlbfs when creating SEV-SNP guests.
So, we should not run into it.
-Brijesh