On Mon, Aug 11, 2014 at 10:20:41AM -0500, Joel Schopp wrote: > Thanks for the detailed review. > > the last case would be case 5 and the default case would be a BUG(). > I agree with the case, but rather than do a BUG() I'm going to print an > error and return -EINVAL. Not worth stopping the host kernel just > because kvm is messed up when we can gracefully exit from it. agreed > > > >> + > >> + /* > >> + * See Table D4-23 and Table D4-25 in ARM DDI 0487A.b to figure out > >> + * the origin of the hardcoded values, 38 and 37. > >> + */ > >> +#ifdef CONFIG_ARM64_64K_PAGES > >> + /* > >> + * 16 <= T0SZ <= 21 is valid under 3 level of translation tables > >> + * 18 <= T0SZ <= 34 is valid under 2 level of translation tables > >> + * 31 <= T0SZ <= 39 is valid under 1 level of transltaion tables > >> + */ > > so this scheme is with concatenated initial level stage-2 page tables. > > > > But we only ever allocate the amount of pages for our pgd according to > > what the host has, so I think this allocation needs to be locked down > > more tight, because the host is always using the appropriate amount for > > 39 bits virtual addresses. > I'll narrow the sanity check of the range. I'll narrow it based on a 39 > - 48 bit VA host range in anticipation of the 4 level 4k and 3 level 64k > host patches going in. > > >> + kvm_err("Cannot support %d-bit address space\n", 64 - t0sz); > >> + return -EINVAL; > >> + } > >> + vttbr_x = 37 - t0sz; > >> +#endif > >> + vttbr_baddr_mask = (((1LLU << (48 - vttbr_x)) - 1) << (vttbr_x - 1)); > >> +#endif > > This nested ifdef is really quite horrible. Can you either factor these > > out into some static inlines in arch/arm[64]/include/asm/kvm_mmu.h or > > provide a per-architecture implementation in a .c file? > I'll factor it out in the file to make it more readable and do away with > the nested ifdef. My theory on putting things into .h files is to only > do it if there is actually another file that uses it. that's not really the design principle between the split of kvm/arm and kvm/arm64 though, but I'll let you choose your preferred method when writing up the patches. > > > >> + return 0; > >> +} > >> + > >> +/** > >> * update_vttbr - Update the VTTBR with a valid VMID before the guest runs > >> * @kvm The guest that we are about to run > >> * > >> @@ -429,8 +502,16 @@ static void update_vttbr(struct kvm *kvm) > >> /* update vttbr to be used with the new vmid */ > >> pgd_phys = virt_to_phys(kvm->arch.pgd); > >> vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK; > >> - kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK; > >> - kvm->arch.vttbr |= vmid; > >> + > >> + /* > >> + * If the VTTBR isn't aligned there is something wrong with the system > >> + * or kernel. It is better to just fail and not mask it. But no need > >> + * to panic the host kernel with a BUG_ON(), instead just log the error. > >> + */ > > These last two sentences are not very helpful, because they don't > > describe the rationale for what you're doing, only what you are (and are > > not) doing. > I'll reword the comment. > > > > That said, I don't think this is doing the right thing. I think you > > want to refuse running the VM and avoid any stage-2 entried being > > created if this is not the case (actually, we may want to check this > > after set_vttbr_baddr_mask() or right aftert allocating the stage-2 > > pgd), because otherwise I think we may be overwriting memory not > > belonging to us with concatenated page tables in a 42-bit 4KB system, > > for example. > My experience here was that the hardware actually catches the error on > the first instruction load of the guest kernel and does a stage 2 > translation abort. However, to be extra safe we could just log the > error with the address of the vttbr and then zero out the pgd_phys part > of vttbr altogether, leaving only the vmid. The guest would then die of > natural causes and we wouldn't have to worry about the outside > possibility of memory getting overwritten. uh, putting zero in the pgd_phys part will just point to random memory if you happen to have memory based at address 0 though, right? I think we should check when we allocate the pgd that it is indeed of the right size and alignment, and if it isn't at this point, it truly is a BUG() and your kernel is terribly busted. > > I don't like the option of just calling BUG() because stopping the host > kernel from running just because we can't run a guest seems a bit > extreme. On the other hand adding a return code to update_vttbr and > checking it (even with unlikely) in the kvm_arch_vcpu_ioctl_run() loop > doesn't thrill me either just from a wasted cpu cycles point of view. Agreed that we shouldn't call BUG on something that could happen. We only should if we really do have a BUG. If we have a programming error in setting our stage-2 page tables, then we're hosed, and I think crashing the kernel is fine. You could make it a VM_BUG_ON() if you don't like having the check regularly, or create the error code return. > > > >> + if (pgd_phys & ~vttbr_baddr_mask) > >> + kvm_err("VTTBR not aligned, expect guest to fail"); > >> + > >> + kvm->arch.vttbr = pgd_phys | vmid; > >> > >> spin_unlock(&kvm_vmid_lock); > >> } > >> @@ -1015,6 +1096,12 @@ int kvm_arch_init(void *opaque) > >> } > >> } > >> > >> + err = set_vttbr_baddr_mask(); > >> + if (err) { > >> + kvm_err("Cannot set vttbr_baddr_mask\n"); > >> + return -EINVAL; > >> + } > >> + > >> cpu_notifier_register_begin(); > >> > >> err = init_hyp_mode(); > >> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h > >> index cc83520..ff4a4fa 100644 > >> --- a/arch/arm64/include/asm/kvm_arm.h > >> +++ b/arch/arm64/include/asm/kvm_arm.h > >> @@ -95,7 +95,6 @@ > >> /* TCR_EL2 Registers bits */ > >> #define TCR_EL2_TBI (1 << 20) > >> #define TCR_EL2_PS (7 << 16) > >> -#define TCR_EL2_PS_40B (2 << 16) > >> #define TCR_EL2_TG0 (1 << 14) > >> #define TCR_EL2_SH0 (3 << 12) > >> #define TCR_EL2_ORGN0 (3 << 10) > >> @@ -104,8 +103,6 @@ > >> #define TCR_EL2_MASK (TCR_EL2_TG0 | TCR_EL2_SH0 | \ > >> TCR_EL2_ORGN0 | TCR_EL2_IRGN0 | TCR_EL2_T0SZ) > >> > >> -#define TCR_EL2_FLAGS (TCR_EL2_PS_40B) > >> - > >> /* VTCR_EL2 Registers bits */ > >> #define VTCR_EL2_PS_MASK (7 << 16) > >> #define VTCR_EL2_TG0_MASK (1 << 14) > >> @@ -120,36 +117,28 @@ > >> #define VTCR_EL2_SL0_MASK (3 << 6) > >> #define VTCR_EL2_SL0_LVL1 (1 << 6) > >> #define VTCR_EL2_T0SZ_MASK 0x3f > >> -#define VTCR_EL2_T0SZ_40B 24 > >> +#define VTCR_EL2_T0SZ(bits) (64 - (bits)) > >> > >> #ifdef CONFIG_ARM64_64K_PAGES > >> /* > >> * Stage2 translation configuration: > >> - * 40bits output (PS = 2) > >> - * 40bits input (T0SZ = 24) > >> * 64kB pages (TG0 = 1) > >> * 2 level page tables (SL = 1) > >> */ > >> #define VTCR_EL2_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SH0_INNER | \ > >> VTCR_EL2_ORGN0_WBWA | VTCR_EL2_IRGN0_WBWA | \ > >> - VTCR_EL2_SL0_LVL1 | VTCR_EL2_T0SZ_40B) > >> -#define VTTBR_X (38 - VTCR_EL2_T0SZ_40B) > >> + VTCR_EL2_SL0_LVL1) > >> #else > >> /* > >> * Stage2 translation configuration: > >> - * 40bits output (PS = 2) > >> - * 40bits input (T0SZ = 24) > >> * 4kB pages (TG0 = 0) > >> * 3 level page tables (SL = 1) > >> */ > >> #define VTCR_EL2_FLAGS (VTCR_EL2_TG0_4K | VTCR_EL2_SH0_INNER | \ > >> VTCR_EL2_ORGN0_WBWA | VTCR_EL2_IRGN0_WBWA | \ > >> - VTCR_EL2_SL0_LVL1 | VTCR_EL2_T0SZ_40B) > >> -#define VTTBR_X (37 - VTCR_EL2_T0SZ_40B) > >> + VTCR_EL2_SL0_LVL1) > >> #endif > >> > >> -#define VTTBR_BADDR_SHIFT (VTTBR_X - 1) > >> -#define VTTBR_BADDR_MASK (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT) > >> #define VTTBR_VMID_SHIFT (48LLU) > >> #define VTTBR_VMID_MASK (0xffLLU << VTTBR_VMID_SHIFT) > >> > >> diff --git a/arch/arm64/kvm/hyp-init.S b/arch/arm64/kvm/hyp-init.S > >> index d968796..c0f7634 100644 > >> --- a/arch/arm64/kvm/hyp-init.S > >> +++ b/arch/arm64/kvm/hyp-init.S > >> @@ -63,17 +63,21 @@ __do_hyp_init: > >> mrs x4, tcr_el1 > >> ldr x5, =TCR_EL2_MASK > >> and x4, x4, x5 > >> - ldr x5, =TCR_EL2_FLAGS > >> - orr x4, x4, x5 > >> - msr tcr_el2, x4 > >> - > >> - ldr x4, =VTCR_EL2_FLAGS > >> /* > >> * Read the PARange bits from ID_AA64MMFR0_EL1 and set the PS bits in > >> - * VTCR_EL2. > >> + * TCR_EL2 and both PS bits and T0SZ bits in VTCR_EL2. > >> */ > >> mrs x5, ID_AA64MMFR0_EL1 > >> bfi x4, x5, #16, #3 > >> + msr tcr_el2, x4 > >> + > >> + ldr x4, =VTCR_EL2_FLAGS > >> + bfi x4, x5, #16, #3 > >> + and x5, x5, #0xf > >> + adr x6, t0sz > >> + add x6, x6, x5, lsl #2 > >> + ldr w5, [x6] > >> + orr x4, x4, x5 > >> msr vtcr_el2, x4 > >> > >> mrs x4, mair_el1 > >> @@ -109,6 +113,10 @@ target: /* We're now in the trampoline code, switch page tables */ > >> > >> /* Hello, World! */ > >> eret > >> + > >> +t0sz: > >> + .word VTCR_EL2_T0SZ(32), VTCR_EL2_T0SZ(36), VTCR_EL2_T0SZ(40) > >> + .word VTCR_EL2_T0SZ(42), VTCR_EL2_T0SZ(44), VTCR_EL2_T0SZ(48) > >> ENDPROC(__kvm_hyp_init) > >> > >> .ltorg > >> > > Thanks for picking this up! > Thanks for the detailed review. I'll try to turnaround v4 quickly. Cool! -Christoffer -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html