Re: [PATCH 10/12] KVM: arm64: nv: Add SW walker for AT S1 emulation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Wed, Jul 31, 2024 at 11:18:06AM +0100, Marc Zyngier wrote:
> On Wed, 31 Jul 2024 10:53:14 +0100,
> Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote:
> > 
> > Hi,
> > 
> > On Wed, Jul 31, 2024 at 09:55:28AM +0100, Marc Zyngier wrote:
> > > On Mon, 29 Jul 2024 16:26:00 +0100,
> > > Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote:
> > > > 
> > > > Hi Marc,
> > > > 
> > > > On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote:
> > > > > In order to plug the brokenness of our current AT implementation,
> > > > > we need a SW walker that is going to... err.. walk the S1 tables
> > > > > and tell us what it finds.
> > > > > 
> > > > > Of course, it builds on top of our S2 walker, and share similar
> > > > > concepts. The beauty of it is that since it uses kvm_read_guest(),
> > > > > it is able to bring back pages that have been otherwise evicted.
> > > > > 
> > > > > This is then plugged in the two AT S1 emulation functions as
> > > > > a "slow path" fallback. I'm not sure it is that slow, but hey.
> > > > > 
> > > > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx>
> > > > > ---
> > > > >  arch/arm64/kvm/at.c | 538 ++++++++++++++++++++++++++++++++++++++++++--
> > > > >  1 file changed, 520 insertions(+), 18 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
> > > > > index 71e3390b43b4c..8452273cbff6d 100644
> > > > > --- a/arch/arm64/kvm/at.c
> > > > > +++ b/arch/arm64/kvm/at.c
> > > > > @@ -4,9 +4,305 @@
> > > > >   * Author: Jintack Lim <jintack.lim@xxxxxxxxxx>
> > > > >   */
> > > > >  
> > > > > +#include <linux/kvm_host.h>
> > > > > +
> > > > > +#include <asm/esr.h>
> > > > >  #include <asm/kvm_hyp.h>
> > > > >  #include <asm/kvm_mmu.h>
> > > > >  
> > > > > +struct s1_walk_info {
> > > > > +	u64	     baddr;
> > > > > +	unsigned int max_oa_bits;
> > > > > +	unsigned int pgshift;
> > > > > +	unsigned int txsz;
> > > > > +	int 	     sl;
> > > > > +	bool	     hpd;
> > > > > +	bool	     be;
> > > > > +	bool	     nvhe;
> > > > > +	bool	     s2;
> > > > > +};
> > > > > +
> > > > > +struct s1_walk_result {
> > > > > +	union {
> > > > > +		struct {
> > > > > +			u64	desc;
> > > > > +			u64	pa;
> > > > > +			s8	level;
> > > > > +			u8	APTable;
> > > > > +			bool	UXNTable;
> > > > > +			bool	PXNTable;
> > > > > +		};
> > > > > +		struct {
> > > > > +			u8	fst;
> > > > > +			bool	ptw;
> > > > > +			bool	s2;
> > > > > +		};
> > > > > +	};
> > > > > +	bool	failed;
> > > > > +};
> > > > > +
> > > > > +static void fail_s1_walk(struct s1_walk_result *wr, u8 fst, bool ptw, bool s2)
> > > > > +{
> > > > > +	wr->fst		= fst;
> > > > > +	wr->ptw		= ptw;
> > > > > +	wr->s2		= s2;
> > > > > +	wr->failed	= true;
> > > > > +}
> > > > > +
> > > > > +#define S1_MMU_DISABLED		(-127)
> > > > > +
> > > > > +static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
> > > > > +			 struct s1_walk_result *wr, const u64 va, const int el)
> > > > > +{
> > > > > +	u64 sctlr, tcr, tg, ps, ia_bits, ttbr;
> > > > > +	unsigned int stride, x;
> > > > > +	bool va55, tbi;
> > > > > +
> > > > > +	wi->nvhe = el == 2 && !vcpu_el2_e2h_is_set(vcpu);
> > > > 
> > > > Where 'el' is computed in handle_at_slow() as:
> > > > 
> > > > 	/*
> > > > 	 * We only get here from guest EL2, so the translation regime
> > > > 	 * AT applies to is solely defined by {E2H,TGE}.
> > > > 	 */
> > > > 	el = (vcpu_el2_e2h_is_set(vcpu) &&
> > > > 	      vcpu_el2_tge_is_set(vcpu)) ? 2 : 1;
> > > > 
> > > > I think 'nvhe' will always be false ('el' is 2 only when E2H is
> > > > set).
> > > 
> > > Yeah, there is a number of problems here. el should depend on both the
> > > instruction (some are EL2-specific) and the HCR control bits. I'll
> > > tackle that now.
> > 
> > Yeah, also noticed that how sctlr, tcr and ttbr are chosen in setup_s1_walk()
> > doesn't look quite right for the nvhe case.
> 
> Are you sure? Assuming the 'el' value is correct (and I think I fixed
> that on my local branch), they seem correct to me (we check for va55
> early in the function to avoid an later issue).
> 
> Can you point out what exactly fails in that logic?

I was trying to say that another consequence of el being 1 in the nvhe case was
that sctlr, tcr and ttbr were read from the EL1 variants of the registers,
instead of EL2. Sorry if that wasn't clear.

Thanks,
Alex

> 
> >
> > > 
> > > > I'm curious about what 'el' represents. The translation regime for the AT
> > > > instruction?
> > > 
> > > Exactly that.
> > 
> > Might I make a suggestion here? I was thinking about dropping the (el, wi-nvhe*)
> > tuple to represent the translation regime and have a wi->regime (or similar) to
> > unambiguously encode the regime. The value can be an enum with three values to
> > represent the three possible regimes (REGIME_EL10, REGIME_EL2, REGIME_EL20).
> 
> I've been thinking of that, but I'm wondering whether that just
> results in pretty awful code in the end, because we go from 2 cases
> (el==1 or el==2) to 3. But most of the time, we don't care about the
> E2H=0 case, because we can handle it just like E2H=1.
> 
> I'll give it a go and see what it looks like.
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux