On Wed, 31 Jul 2024 10:53:14 +0100, Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > Hi, > > On Wed, Jul 31, 2024 at 09:55:28AM +0100, Marc Zyngier wrote: > > On Mon, 29 Jul 2024 16:26:00 +0100, > > Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > > > > > Hi Marc, > > > > > > On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote: > > > > In order to plug the brokenness of our current AT implementation, > > > > we need a SW walker that is going to... err.. walk the S1 tables > > > > and tell us what it finds. > > > > > > > > Of course, it builds on top of our S2 walker, and share similar > > > > concepts. The beauty of it is that since it uses kvm_read_guest(), > > > > it is able to bring back pages that have been otherwise evicted. > > > > > > > > This is then plugged in the two AT S1 emulation functions as > > > > a "slow path" fallback. I'm not sure it is that slow, but hey. > > > > > > > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > > > > --- > > > > arch/arm64/kvm/at.c | 538 ++++++++++++++++++++++++++++++++++++++++++-- > > > > 1 file changed, 520 insertions(+), 18 deletions(-) > > > > > > > > diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c > > > > index 71e3390b43b4c..8452273cbff6d 100644 > > > > --- a/arch/arm64/kvm/at.c > > > > +++ b/arch/arm64/kvm/at.c > > > > @@ -4,9 +4,305 @@ > > > > * Author: Jintack Lim <jintack.lim@xxxxxxxxxx> > > > > */ > > > > > > > > +#include <linux/kvm_host.h> > > > > + > > > > +#include <asm/esr.h> > > > > #include <asm/kvm_hyp.h> > > > > #include <asm/kvm_mmu.h> > > > > > > > > +struct s1_walk_info { > > > > + u64 baddr; > > > > + unsigned int max_oa_bits; > > > > + unsigned int pgshift; > > > > + unsigned int txsz; > > > > + int sl; > > > > + bool hpd; > > > > + bool be; > > > > + bool nvhe; > > > > + bool s2; > > > > +}; > > > > + > > > > +struct s1_walk_result { > > > > + union { > > > > + struct { > > > > + u64 desc; > > > > + u64 pa; > > > > + s8 level; > > > > + u8 APTable; > > > > + bool UXNTable; > > > > + bool PXNTable; > > > > + }; > > > > + struct { > > > > + u8 fst; > > > > + bool ptw; > > > > + bool s2; > > > > + }; > > > > + }; > > > > + bool failed; > > > > +}; > > > > + > > > > +static void fail_s1_walk(struct s1_walk_result *wr, u8 fst, bool ptw, bool s2) > > > > +{ > > > > + wr->fst = fst; > > > > + wr->ptw = ptw; > > > > + wr->s2 = s2; > > > > + wr->failed = true; > > > > +} > > > > + > > > > +#define S1_MMU_DISABLED (-127) > > > > + > > > > +static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi, > > > > + struct s1_walk_result *wr, const u64 va, const int el) > > > > +{ > > > > + u64 sctlr, tcr, tg, ps, ia_bits, ttbr; > > > > + unsigned int stride, x; > > > > + bool va55, tbi; > > > > + > > > > + wi->nvhe = el == 2 && !vcpu_el2_e2h_is_set(vcpu); > > > > > > Where 'el' is computed in handle_at_slow() as: > > > > > > /* > > > * We only get here from guest EL2, so the translation regime > > > * AT applies to is solely defined by {E2H,TGE}. > > > */ > > > el = (vcpu_el2_e2h_is_set(vcpu) && > > > vcpu_el2_tge_is_set(vcpu)) ? 2 : 1; > > > > > > I think 'nvhe' will always be false ('el' is 2 only when E2H is > > > set). > > > > Yeah, there is a number of problems here. el should depend on both the > > instruction (some are EL2-specific) and the HCR control bits. I'll > > tackle that now. > > Yeah, also noticed that how sctlr, tcr and ttbr are chosen in setup_s1_walk() > doesn't look quite right for the nvhe case. Are you sure? Assuming the 'el' value is correct (and I think I fixed that on my local branch), they seem correct to me (we check for va55 early in the function to avoid an later issue). Can you point out what exactly fails in that logic? > > > > > > I'm curious about what 'el' represents. The translation regime for the AT > > > instruction? > > > > Exactly that. > > Might I make a suggestion here? I was thinking about dropping the (el, wi-nvhe*) > tuple to represent the translation regime and have a wi->regime (or similar) to > unambiguously encode the regime. The value can be an enum with three values to > represent the three possible regimes (REGIME_EL10, REGIME_EL2, REGIME_EL20). I've been thinking of that, but I'm wondering whether that just results in pretty awful code in the end, because we go from 2 cases (el==1 or el==2) to 3. But most of the time, we don't care about the E2H=0 case, because we can handle it just like E2H=1. I'll give it a go and see what it looks like. Thanks, M. -- Without deviation from the norm, progress is not possible.