If migration of a VM with hugepages is canceled midway, KVM will adjust the stage-2 table mappings back to block mappings. We currently use BBM to replace the table entry with a block entry. Take adjustment of 1G block mapping as an example, with BBM procedures, we have to invalidate the old table entry of level 1 first, flush TLB and unmap the old table mappings, right before installing the new block entry. So there will be a bit long period when the table entry of level 1 is invalid before installation of block entry, if other vCPUs access any guest page within the 1G range during this period and find the table entry invalid, they will all exit from guest with an translation fault. Actually, these translation faults are not necessary, because the block mapping will be built later. Besides, KVM will try to build 1G block mappings for these translation faults, and will perform cache maintenance operations, page table walk, etc. Approaches of TTRem level 1,2 ensure that there will be not a moment when the old table entry is invalid before installation of the new block entry, so no unnecessary translation faults will be caused. But level-2 method will possibly lead to a TLB conflict which is bothering, so we use nT both at level-1 and level-2 case to avoid handling TLB conflict aborts. Signed-off-by: Yanan Wang <wangyanan55@xxxxxxxxxx> --- arch/arm64/kvm/hyp/pgtable.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index c8b959e3951b..ab1c94985ed0 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -49,6 +49,8 @@ KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \ KVM_PTE_LEAF_ATTR_HI_S2_XN) +#define KVM_PTE_LEAF_BLOCK_S2_NT BIT(16) + struct kvm_pgtable_walk_data { struct kvm_pgtable *pgt; struct kvm_pgtable_walker *walker; @@ -502,6 +504,39 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, return 0; } +static int stage2_coalesce_tables_into_block(u64 addr, u32 level, + kvm_pte_t *ptep, + struct stage2_map_data *data) +{ + u32 ttrem_level = data->ttrem_level; + u64 granule = kvm_granule_size(level), phys = data->phys; + kvm_pte_t new = kvm_init_valid_leaf_pte(phys, data->attr, level); + + switch (ttrem_level) { + case TTREM_LEVEL0: + kvm_set_invalid_pte(ptep); + + /* + * Invalidate the whole stage-2, as we may have numerous leaf + * entries below us which would otherwise need invalidating + * individually. + */ + kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu); + smp_store_release(ptep, new); + data->phys += granule; + return 0; + case TTREM_LEVEL1: + case TTREM_LEVEL2: + WRITE_ONCE(*ptep, new | KVM_PTE_LEAF_BLOCK_S2_NT); + kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu); + WRITE_ONCE(*ptep, new & ~KVM_PTE_LEAF_BLOCK_S2_NT); + data->phys += granule; + return 0; + } + + return -EINVAL; +} + static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, struct stage2_map_data *data) -- 2.19.1