On Thu, Dec 08, 2022 at 07:01:57PM +0000, Sean Christopherson wrote: > On Thu, Dec 08, 2022, Ricardo Koller wrote: > > On Thu, Dec 08, 2022 at 12:37:23AM +0000, Oliver Upton wrote: > > > On Thu, Dec 08, 2022 at 12:24:20AM +0000, Sean Christopherson wrote: > > > > > Even still, that's just a kludge to make ucalls work. We have other > > > > > MMIO devices (GIC distributor, for example) that work by chance since > > > > > nothing conflicts with the constant GPAs we've selected in the tests. > > > > > > > > > > I'd rather we go down the route of having an address allocator for the > > > > > for both the VA and PA spaces to provide carveouts at runtime. > > > > > > > > Aren't those two separate issues? The PA, a.k.a. memslots space, can be solved > > > > by allocating a dedicated memslot, i.e. doesn't need a carve. At worst, collisions > > > > will yield very explicit asserts, which IMO is better than whatever might go wrong > > > > with a carve out. > > > > > > Perhaps the use of the term 'carveout' wasn't right here. > > > > > > What I'm suggesting is we cannot rely on KVM memslots alone to act as an > > > allocator for the PA space. KVM can provide devices to the guest that > > > aren't represented as memslots. If we're trying to fix PA allocations > > > anyway, why not make it generic enough to suit the needs of things > > > beyond ucalls? > > > > One extra bit of information: in arm, IO is any access to an address (within > > bounds) not backed by a memslot. Not the same as x86 where MMIO are writes to > > read-only memslots. No idea what other arches do. > > I don't think that's correct, doesn't this code turn write abort on a RO memslot > into an io_mem_abort()? Specifically, the "(write_fault && !writable)" check will > match, and assuming none the the edge cases in the if-statement fire, KVM will > send the write down io_mem_abort(). You are right. In fact, page_fault_test checks precisely that: writes on RO memslots are sent to userspace as an mmio exit. I was just referring to the MMIO done for ucall. Having said that, we could use ucall as writes on read-only memslots like what x86 does. > > gfn = fault_ipa >> PAGE_SHIFT; > memslot = gfn_to_memslot(vcpu->kvm, gfn); > hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable); > write_fault = kvm_is_write_fault(vcpu); > if (kvm_is_error_hva(hva) || (write_fault && !writable)) { > /* > * The guest has put either its instructions or its page-tables > * somewhere it shouldn't have. Userspace won't be able to do > * anything about this (there's no syndrome for a start), so > * re-inject the abort back into the guest. > */ > if (is_iabt) { > ret = -ENOEXEC; > goto out; > } > > if (kvm_vcpu_abt_iss1tw(vcpu)) { > kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > ret = 1; > goto out_unlock; > } > > /* > * Check for a cache maintenance operation. Since we > * ended-up here, we know it is outside of any memory > * slot. But we can't find out if that is for a device, > * or if the guest is just being stupid. The only thing > * we know for sure is that this range cannot be cached. > * > * So let's assume that the guest is just being > * cautious, and skip the instruction. > */ > if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) { > kvm_incr_pc(vcpu); > ret = 1; > goto out_unlock; > } > > /* > * The IPA is reported as [MAX:12], so we need to > * complement it with the bottom 12 bits from the > * faulting VA. This is always 12 bits, irrespective > * of the page size. > */ > fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1); > ret = io_mem_abort(vcpu, fault_ipa); > goto out_unlock; > }