On Wed, Jul 10, 2019 at 03:20:18PM +1000, Suraj Jitindar Singh wrote: > The virtual real mode addressing (VRMA) mechanism is used when a > partition is using HPT (Hash Page Table) translation and performs > real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this > mode effective address bits 0:23 are treated as zero (i.e. the access > is aliased to 0) and the access is performed using an implicit 1TB SLB > entry. > > The size of the RMA (Real Memory Area) is communicated to the guest as > the size of the first memory region in the device tree. And because of > the mechanism described above can be expected to not exceed 1TB. In the > event that the host erroneously represents the RMA as being larger than > 1TB, guest accesses in real mode to memory addresses above 1TB will be > aliased down to below 1TB. This means that a memory access performed in > real mode may differ to one performed in virtual mode for the same memory > address, which would likely have unintended consequences. > > To avoid this outcome have the guest explicitly limit the size of the > RMA to the current maximum, which is 1TB. This means that even if the > first memory block is larger than 1TB, only the first 1TB should be > accessed in real mode. > > Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@xxxxxxxxx> > --- > arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++++++ > 1 file changed, 8 insertions(+) Hi, Tested this patch and now Power8 compat guest boots fine with mem >1024G on Power9 host. Tested-by: Satheesh Rajendran <sathnaga@xxxxxxxxxxxxxxxxxx> Host: P9; kernel: 5.2.0-00915-g5ad18b2e60b7 Before this patch: Guest crashes.. [0.000000] BUG: Kernel NULL pointer dereference at 0x00000028 [0.000000] Faulting instruction address: 0xc00000000102caa0 [0.000000] Oops: Kernel access of bad area, sig: 11 [#1] [0.000000] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [0.000000] Modules linked in: [0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-03135-ge9a83bd23220 #24 [0.000000] NIP: c00000000102caa0 LR: c00000000102ca84 CTR: 0000000000000000 [0.000000] REGS: c000000001603ba0 TRAP: 0380 Not tainted (5.2.0-03135-ge9a83bd23220) [0.000000] MSR: 8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 24000428 XER: 20000000 [0.000000] CFAR: c00000000102c1d8 IRQMASK: 1 [0.000000] GPR00: c00000000102ca84 c000000001603e30 c000000001605300 0000010000000000 [0.000000] GPR04: 0000000000000000 0000000000000000 c00000ffffff8000 c000000001863dc8 [0.000000] GPR08: 0000000000002028 0000000000000000 c00000ffffff8000 0000000000000009 [0.000000] GPR12: 0000000000000000 c0000000018f0000 000000007dc5fef0 00000000012e1220 [0.000000] GPR16: 00000000012e10a0 fffffffffffffffd 000000007dc5fef0 000000000130fcc0 [0.000000] GPR20: 0000000000000014 0000000001a80000 000000002fff0000 fffffffffffffffd [0.000000] GPR24: 0000000001d0000c c000000000000000 c000000001641ed8 c000000001641b78 [0.000000] GPR28: 0000000000000000 0000000000000000 0000010000000000 0000000000000000 [0.000000] NIP [c00000000102caa0] emergency_stack_init+0xb8/0x118 [0.000000] LR [c00000000102ca84] emergency_stack_init+0x9c/0x118 [0.000000] Call Trace: [0.000000] [c000000001603e30] [c00000000102ca84] emergency_stack_init+0x9c/0x118 (unreliable) [0.000000] [c000000001603e80] [c00000000102bd54] setup_arch+0x2fc/0x388 [0.000000] [c000000001603ef0] [c000000001023ccc] start_kernel+0xa4/0x660 [0.000000] [c000000001603f90] [c00000000000b774] start_here_common+0x1c/0x528 [0.000000] Instruction dump: [0.000000] 7ffc07b4 7fc3f378 7bfd1f24 7f84e378 4bfff6e9 3f620004 3b7bc878 7f84e378 [0.000000] 39434000 7fc3f378 e93b0000 7d29e82a <f9490028> 4bfff6c5 e93b0000 7f84e378 [0.000000] random: get_random_bytes called from print_oops_end_marker+0x6c/0xa0 with crng_init=0 [0.000000] ---[ end trace 0000000000000000 ]--- [0.000000] [0.000000] Kernel panic - not syncing: Attempted to kill the idle task! ------------------------- With this patch: # virsh start --console p8 Domain p8 started Connected to domain p8 .. .. Fedora 27 (Twenty Seven) Kernel 5.2.0-03136-gf709b0494ad9 on an ppc64le (hvc0) atest-guest login: # free -g total used free shared buff/cache available Mem: 1028 0 1027 0 0 1025 Swap: 0 0 Regards, -Satheesh. > > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c > index 28ced26f2a00..4d0e2cce9cd5 100644 > --- a/arch/powerpc/mm/book3s64/hash_utils.c > +++ b/arch/powerpc/mm/book3s64/hash_utils.c > @@ -1901,11 +1901,19 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base, > * > * For guests on platforms before POWER9, we clamp the it limit to 1G > * to avoid some funky things such as RTAS bugs etc... > + * On POWER9 we limit to 1TB in case the host erroneously told us that > + * the RMA was >1TB. Effective address bits 0:23 are treated as zero > + * (meaning the access is aliased to zero i.e. addr = addr % 1TB) > + * for virtual real mode addressing and so it doesn't make sense to > + * have an area larger than 1TB as it can't be addressed. > */ > if (!early_cpu_has_feature(CPU_FTR_HVMODE)) { > ppc64_rma_size = first_memblock_size; > if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) > ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000); > + else > + ppc64_rma_size = min_t(u64, ppc64_rma_size, > + 1UL << SID_SHIFT_1T); > > /* Finally limit subsequent allocations */ > memblock_set_current_limit(ppc64_rma_size); > -- > 2.13.6 >