On Thu, 9 Sep 2021 18:22:43 +0200 David Hildenbrand <david@xxxxxxxxxx> wrote: > There are multiple things broken about our storage key handling > functions: > > 1. We should not walk/touch page tables outside of VMA boundaries when > holding only the mmap sem in read mode. Evil user space can modify the > VMA layout just before this function runs and e.g., trigger races with > page table removal code since commit dd2283f2605e ("mm: mmap: zap pages > with read mmap_sem in munmap"). gfn_to_hva() will only translate using > KVM memory regions, but won't validate the VMA. > > 2. We should not allocate page tables outside of VMA boundaries: if > evil user space decides to map hugetlbfs to these ranges, bad things > will happen because we suddenly have PTE or PMD page tables where we > shouldn't have them. > > 3. We don't handle large PUDs that might suddenly appeared inside our page > table hierarchy. > > Don't manually allocate page tables, properly validate that we have VMA and > bail out on pud_large(). > > All callers of page table handling functions, except > get_guest_storage_key(), call fixup_user_fault() in case they > receive an -EFAULT and retry; this will allocate the necessary page tables > if required. > > To keep get_guest_storage_key() working as expected and not requiring > kvm_s390_get_skeys() to call fixup_user_fault() distinguish between > "there is simply no page table or huge page yet and the key is assumed > to be 0" and "this is a fault to be reported". > > Although commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key handling") > introduced most of the affected code, it was actually already broken > before when using get_locked_pte() without any VMA checks. > > Note: Ever since commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key > handling") we can no longer set a guest storage key (for example from > QEMU during VM live migration) without actually resolving a fault. > Although we would have created most page tables, we would choke on the > !pmd_present(), requiring a call to fixup_user_fault(). I would > have thought that this is problematic in combination with postcopy life > migration ... but nobody noticed and this patch doesn't change the > situation. So maybe it's just fine. > > Fixes: 9fcf93b5de06 ("KVM: S390: Create helper function get_guest_storage_key") > Fixes: 24d5dd0208ed ("s390/kvm: Provide function for setting the guest storage key") > Fixes: a7e19ab55ffd ("KVM: s390: handle missing storage-key facility") > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> Reviewed-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> > --- > arch/s390/mm/pgtable.c | 57 +++++++++++++++++++++++++++++------------- > 1 file changed, 39 insertions(+), 18 deletions(-) > > diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c > index 54969e0f3a94..5fb409ff7842 100644 > --- a/arch/s390/mm/pgtable.c > +++ b/arch/s390/mm/pgtable.c > @@ -429,22 +429,36 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *mm, > } > > #ifdef CONFIG_PGSTE > -static pmd_t *pmd_alloc_map(struct mm_struct *mm, unsigned long addr) > +static int pmd_lookup(struct mm_struct *mm, unsigned long addr, pmd_t **pmdp) > { > + struct vm_area_struct *vma; > pgd_t *pgd; > p4d_t *p4d; > pud_t *pud; > - pmd_t *pmd; > + > + /* We need a valid VMA, otherwise this is clearly a fault. */ > + vma = vma_lookup(mm, addr); > + if (!vma) > + return -EFAULT; > > pgd = pgd_offset(mm, addr); > - p4d = p4d_alloc(mm, pgd, addr); > - if (!p4d) > - return NULL; > - pud = pud_alloc(mm, p4d, addr); > - if (!pud) > - return NULL; > - pmd = pmd_alloc(mm, pud, addr); > - return pmd; > + if (!pgd_present(*pgd)) > + return -ENOENT; > + > + p4d = p4d_offset(pgd, addr); > + if (!p4d_present(*p4d)) > + return -ENOENT; > + > + pud = pud_offset(p4d, addr); > + if (!pud_present(*pud)) > + return -ENOENT; > + > + /* Large PUDs are not supported yet. */ > + if (pud_large(*pud)) > + return -EFAULT; > + > + *pmdp = pmd_offset(pud, addr); > + return 0; > } > #endif > > @@ -778,8 +792,7 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr, > pmd_t *pmdp; > pte_t *ptep; > > - pmdp = pmd_alloc_map(mm, addr); > - if (unlikely(!pmdp)) > + if (pmd_lookup(mm, addr, &pmdp)) > return -EFAULT; > > ptl = pmd_lock(mm, pmdp); > @@ -881,8 +894,7 @@ int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr) > pte_t *ptep; > int cc = 0; > > - pmdp = pmd_alloc_map(mm, addr); > - if (unlikely(!pmdp)) > + if (pmd_lookup(mm, addr, &pmdp)) > return -EFAULT; > > ptl = pmd_lock(mm, pmdp); > @@ -935,15 +947,24 @@ int get_guest_storage_key(struct mm_struct *mm, unsigned long addr, > pmd_t *pmdp; > pte_t *ptep; > > - pmdp = pmd_alloc_map(mm, addr); > - if (unlikely(!pmdp)) > + /* > + * If we don't have a PTE table and if there is no huge page mapped, > + * the storage key is 0. > + */ > + *key = 0; > + > + switch (pmd_lookup(mm, addr, &pmdp)) { > + case -ENOENT: > + return 0; > + case 0: > + break; > + default: > return -EFAULT; > + } > > ptl = pmd_lock(mm, pmdp); > if (!pmd_present(*pmdp)) { > - /* Not yet mapped memory has a zero key */ > spin_unlock(ptl); > - *key = 0; > return 0; > } >