Hi Marc, On Wed, Jul 27, 2022 at 10:52:34AM +0100, Marc Zyngier wrote: > On Wed, 27 Jul 2022 10:30:59 +0100, > Marc Zyngier <maz@xxxxxxxxxx> wrote: > > > > On Tue, 26 Jul 2022 18:51:21 +0100, > > Oliver Upton <oliver.upton@xxxxxxxxx> wrote: > > > > > > Doesn't pinning the buffer also imply pinning the stage 1 tables > > > responsible for its translation as well? I agree that pinning the buffer > > > is likely the best way forward as pinning the whole of guest memory is > > > entirely impractical. > > Huh, I just realised that you were talking about S1. I don't think we > need to do this. As long as the translation falls into a mapped > region (pinned or not), we don't need to worry. > > If we get a S2 translation fault from SPE, we just go and map it. And > TBH the pinning here is just a optimisation against things like swap, > KSM and similar things. The only thing we need to make sure is that > the fault is handled in the context of the vcpu that owns this SPU. > > Alex, can you think of anything that would cause a problem (other than > performance and possible blackout windows) if we didn't do any pinning > at all and just handled the SPE interrupts as normal page faults? PMBSR_EL1.DL might be set 1 as a result of stage 2 fault reported by SPE, which means the last record written is incomplete. Records have a variable size, so it's impossible for KVM to revert to the end of the last known good record without parsing the buffer (references here [1]). And even if KVM would know the size of a record, there's this bit in the Arm ARM which worries me (ARM DDI 0487H.a, page D10-5177): "The architecture does not require that a sample record is written sequentially by the SPU, only that: [..] - On a Profiling Buffer management interrupt, PMBSR_EL1.DL indicates whether PMBPTR_EL1 points to the first byte after the last complete sample record." So there might be gaps in the buffer, meaning that the entire buffer would have to be discarded if DL is set as a result of a stage 2 fault. Also, I'm not sure if you're aware of this, but SPE reports the guest VA in PMBPTR_EL1 (not the IPA) on a fault, so KVM would have to walk the guest's stage 1 tables to service the faults, which would add to the overhead of servicing the fault. Don't know if that makes a difference, just thought I should mention it as another peculiarity of SPE. [1] https://lore.kernel.org/all/Yl7KewpTj+7NSonf@monolith.localdoman/ Thanks, Alex _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm