On 13.02.2018 15:36, David Hildenbrand wrote: > On 09.02.2018 10:34, Janosch Frank wrote: >> For later migration of huge pages we want to write-protect guest >> PMDs. While doing this, we have to make absolutely sure, that the >> guest's lowcore is always accessible when the VCPU is running. With >> PTEs, this is solved by marking the PGSTEs of the lowcore pages with >> the invalidation notification bit and kicking the guest out of the SIE >> via a notifier function if we need to invalidate such a page. >> >> With PMDs we do not have PGSTEs or some other bits we could use in the >> host PMD. Instead we pick one of the free bits in the gmap PMD. Every >> time a host pmd will be invalidated, we will check if the respective >> gmap PMD has the bit set and in that case fire up the notifier. >> >> In the first step we only support setting the invalidation bit, but we >> do not support restricting access of guest pmds. It will follow >> shortly. >> >> Signed-off-by: Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx> > > [...] > >> + >> +/** >> + * gmap_pmd_split - Split a huge gmap pmd and use a page table instead >> + * @gmap: pointer to guest mapping meta data structure >> + * @gaddr: virtual address in the guest address space >> + * @pmdp: pointer to the pmd that will be split >> + * >> + * When splitting gmap pmds, we have to make the resulting page table >> + * look like it's a normal one to be able to use the common pte >> + * handling functions. Also we need to track these new tables as they >> + * aren't tracked anywhere else. >> + */ >> +static int gmap_pmd_split(struct gmap *gmap, unsigned long gaddr, pmd_t *pmdp) >> +{ >> + unsigned long *table; >> + struct page *page; >> + pmd_t new; >> + int i; >> + > > That's interesting, because the SIE can now suddenly work on these > PGSTEs, e.g. not leading to intercepts on certain events (like setting > storage keys). > > How is that intended to be handled? I assume we would somehow have to > forbid the SIE from making use of the PGSTE. But that involves clearing > certain interception controls, which might be problematic. Well, cmma is disabled and storage keys should only be a problem, when the pte is invalid without the pgste lock, which should never be the case for split pmds. > >> + page = page_table_alloc_pgste(gmap->mm); >> + if (!page) >> + return -ENOMEM; >> + table = (unsigned long *) page_to_phys(page); >> + for (i = 0; i < 256; i++) { >> + table[i] = (pmd_val(*pmdp) & HPAGE_MASK) + i * PAGE_SIZE; >> + /* pmd_large() implies pmd/pte_present() */ >> + table[i] |= _PAGE_PRESENT | _PAGE_READ | _PAGE_WRITE; >> + /* ptes are directly marked as dirty */ >> + table[i + PTRS_PER_PTE] |= PGSTE_UC_BIT; >> + } >> + >> + pmd_val(new) = ((unsigned long)table | _SEGMENT_ENTRY | >> + (_SEGMENT_ENTRY_GMAP_SPLIT)); >> + list_add(&page->lru, &gmap->split_list); >> + gmap_pmdp_xchg(gmap, pmdp, new, gaddr); >> + return 0; >> +} >> + >> /* >> * gmap_protect_pte - remove access rights to memory and set pgste bits >> * @gmap: pointer to guest mapping meta data structure >> @@ -941,7 +1021,7 @@ static int gmap_protect_pte(struct gmap *gmap, unsigned long gaddr, >> spinlock_t *ptl = NULL; >> unsigned long pbits = 0; >> >> - ptep = pte_alloc_map_lock(gmap->mm, pmdp, gaddr, &ptl); >> + ptep = gmap_pte_from_pmd(gmap, pmdp, gaddr, &ptl); >> if (!ptep) >> return -ENOMEM; >> >> @@ -979,15 +1059,21 @@ static int gmap_protect_range(struct gmap *gmap, unsigned long gaddr, >> rc = -EAGAIN; >> pmdp = gmap_pmd_op_walk(gmap, gaddr); >> if (pmdp && !(pmd_val(*pmdp) & _SEGMENT_ENTRY_INVALID)) { >> - rc = gmap_protect_pte(gmap, gaddr, pmdp, prot, >> - bits); >> - if (!rc) { >> - len -= PAGE_SIZE; >> - gaddr += PAGE_SIZE; >> + if (!pmd_large(*pmdp)) { >> + rc = gmap_protect_pte(gmap, gaddr, pmdp, prot, >> + bits); >> + if (!rc) { >> + len -= PAGE_SIZE; >> + gaddr += PAGE_SIZE; >> + } >> + } else { >> + rc = gmap_pmd_split(gmap, gaddr, pmdp); >> + if (!rc) >> + rc = -EFAULT; > > Not sure if -EFAULT is the right thing to do here. Actually this would > be a much better use case for -EAGAIN. > > (or simply avoid this by doing an gmap_pmd_op_end() + continue;) That is a leftover from the dark days of pmd protecting/notifying. Will fix. > >> } >> gmap_pmd_op_end(gmap, pmdp); >> } >> - if (rc) { >> + if (rc && rc != -EFAULT) { >> vmaddr = __gmap_translate(gmap, gaddr); >> if (IS_ERR_VALUE(vmaddr)) >> return vmaddr; >> @@ -2133,6 +2219,39 @@ static void gmap_shadow_notify(struct gmap *sg, unsigned long vmaddr, >> spin_unlock(&sg->guest_table_lock); >> } >> /** >> * ptep_notify - call all invalidation callbacks for a specific pte. >> * @mm: pointer to the process mm_struct >> @@ -2177,6 +2296,23 @@ void ptep_notify(struct mm_struct *mm, unsigned long vmaddr, >> } >> EXPORT_SYMBOL_GPL(ptep_notify); >> >> +static void pmdp_notify_split(struct mm_struct *mm, unsigned long vmaddr, >> + unsigned long *table) >> +{ >> + int i = 0; >> + unsigned long bits; >> + unsigned long *ptep = (unsigned long *)(*table & PAGE_MASK); >> + unsigned long *pgste = ptep + PTRS_PER_PTE; >> + >> + for (; i < 256; i++, vmaddr += PAGE_SIZE, ptep++, pgste++) { >> + bits = *pgste & (PGSTE_IN_BIT | PGSTE_VSIE_BIT); >> + if (bits) { >> + *pgste ^= bits; >> + ptep_notify_gmap(mm, vmaddr, (pte_t *)ptep, bits); > > All existing callers of ptep_notify_gmap() are called with the pgste > lock being held. I guess we don't need that here as we always take the > guest_table_lock when working with split PMDs. That's my understanding > > Complicated stuff :) Yes, also this is rather horrible performance wise, as we do another run through the gmap list in ptep_notify_gmap, which we need to do, because we don't get the gmap in ptep_remove_dirty_protection_split and test_and_clear_guest_dirty_split. The reason for that being, that we need the pgste lock for our dirty changes which we only can do in pgtable.c for more or less proper separation of mm and gmap functions. I've yet to find a nice and clean solution to that. > >> + } >> + } >> +} > >
Attachment:
signature.asc
Description: OpenPGP digital signature