On Tue, 30 Mar 2021 at 15:56, Marc Zyngier <maz@xxxxxxxxxx> wrote: > > On Tue, 30 Mar 2021 14:15:19 +0100, > Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > > > On Tue, 30 Mar 2021 at 15:04, Marc Zyngier <maz@xxxxxxxxxx> wrote: > > > > > > On Tue, 30 Mar 2021 13:49:18 +0100, > > > Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > > > > > > > On Tue, 30 Mar 2021 at 14:44, Marc Zyngier <maz@xxxxxxxxxx> wrote: > > > > > > > > > > On Tue, 30 Mar 2021 12:21:26 +0100, > > > > > Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > > > > > > > > > > > Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA > > > > > > configurations") introduced a new layout for the 52-bit VA space, in > > > > > > order to maximize the space available to the linear region. After this > > > > > > change, the kernel VA space is no longer split 1:1 down the middle, and > > > > > > as it turns out, this violates an assumption in the KVM init code when > > > > > > it chooses the layout for the nVHE EL2 mapping. > > > > > > > > > > > > Given that EFI does not support 52-bit VA addressing (as it only > > > > > > supports 4k pages), and that in general, loaders cannot assume that the > > > > > > kernel being loaded supports 52-bit VA/PA addressing in the first place, > > > > > > we can safely assume that the kernel, and therefore the .idmap section, > > > > > > will be 48-bit addressable on 52-bit VA capable systems. > > > > > > > > > > > > So in this case, organize the nVHE EL2 address space as a 2^48 byte > > > > > > window starting at address 0x0, containing the ID map and the > > > > > > hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte > > > > > > linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in > > > > > > size, so it is slightly larger, but this only matters on systems where > > > > > > the DRAM footprint in the physical memory map exceeds 3968 TB) > > > > > > > > > > So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not > > > > > necessarily because I have that much memory, but because my system has > > > > > multiple memory banks, one of which lands on that spot, I cannot map > > > > > such memory at EL2. We'll explode at run time. > > > > > > > > > > Can we keep the private mapping to 47 bits and restore the missing > > > > > chunk to the linear mapping? Of course, it means that the linear map > > > > > is now potential no linear anymore, so we'd have to garantee that the > > > > > kernel lines in the first 2^47 bits instead. Crap. > > > > > > > > > > > > > Yeah. The linear region needs to be contiguous. Alternatively, we > > > > could restrict the upper address limit for loading the kernel to 47 > > > > bits. > > > > > > Is that something we can do retroactively? We could mandate it for > > > LVA systems only, but that's a bit odd. > > > > > > > Yeah, especially given the fact that LVA systems will be VHE capable > > and may therefore not care in the first place. > > > > On systems that have memory that high, EFI is likely to load the > > kernel there, as it usually allocates from the top down, and it tries > > to avoid having to move it around unless asked to (via KASLR), in > > which case it will currently randomize over the entire available > > memory space. > > > > So it is going to add a special case for a corner^2 case, i.e., nVHE > > on 52-bit/64k pages with more than 3968 TB distance between the start > > and end of DRAM. Ugh. > > Yeah. I'd rather we ignore that memory altogether, but I don't think > we can. > > > It seems to me that the only way to solve this is to permit the idmap > > and the hyp linear region to overlap, and use the 2^47 byte window at > > the top of the address space for the hyp private mappings instead of > > the one at the bottom. > > But that's the hard problem I want to avoid thinking of. > > We need to ensure that there is no EL1 VA that is congruent with the > idmap over the kern_hyp_va() transformation. It means imposing > restrictions over the EL1 linear map, and prevent any allocation that > would result in this overlap (and that is including text). > > How do we do that? > A phys to virt offset of 0x0 is perfectly acceptable, no? The only difference is that the idmapped bits are in another part of the VA space. > Frankly, I think we need to start looking into enabling VHE for the > nVHE /behaviour/. Having a single TTBR on these systems is just > insane. > > M. > > -- > Without deviation from the norm, progress is not possible. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm