On Sat, Jul 29, 2023 at 3:36 AM Marc Zyngier <maz@xxxxxxxxxx> wrote: > > On Fri, 21 Jul 2023 10:48:27 +0100, > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > > > On Fri, Jul 21 2023, Marc Zyngier <maz@xxxxxxxxxx> wrote: > > > > > On Fri, 21 Jul 2023 09:38:23 +0100, > > > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > >> > > >> On Thu, Jul 20 2023, Jing Zhang <jingzhangos@xxxxxxxxxx> wrote: > > >> > No mechanism was provided to userspace to discover if a given idreg or > > >> > any fields of a given idreg is writable. The write to a readonly idreg > > >> > can also succeed (write ignored) without any error if what's written > > >> > is exactly the same as what the idreg holds or if it is a write to > > >> > AArch32 idregs on an AArch64-only system. > > >> > > >> Hm, I'm not sure that's a good thing for the cases where we want to > > >> support mix-and-match userspace and kernels. Userspace may want to know > > >> upfront whether it can actually tweak the contents of an idreg or not > > >> (for example, in the context of using CPU models for compatibility), so > > >> that it can reject or warn about certain configurations that may not > > >> turn out as the user expects. > > >> > > >> > Not sure if it is worth adding an API to return the writable mask for > > >> > idregs, since we want to enable the writable for all allocated > > >> > unhidden idregs eventually. > > >> > > >> We'd enable any new idregs for writing from the start in the future, I > > >> guess? > > >> > > >> I see two approaches here: > > >> - add an API to get a list of idregs with their writable masks > > >> - add a capability "you can write to all idregs whatever you'd expect to > > >> be able to write there architecture wise", which would require to add > > >> support for all idregs prior to exposing that cap > > >> > > >> The second option would be the easier one (if we don't manage to break > > >> it in the future :) > > > > > > I'm not sure the last option is even possible. The architecture keeps > > > allocating new ID registers in the op0==3, op1=={0, 1, 3}, CRn==0, > > > CRm=={0-7}, op2=={0-7} space, so fields that were RES0 until then > > > start having a non-0 value. > > > > > > This could lead to a situation where you move from a system that > > > didn't know about ID_AA64MMFR6_EL1.XYZ to a system that advertises it, > > > and for which the XYZ instruction has another behaviour. Bad things > > > follow. > > > > Hrm :( > > > > > > > > My preference would be a single ioctl that returns the full list of > > > writeable masks in the ID reg range. It is big, but not crazy big > > > (1536 bytes, if I haven't messed up), and includes the non ID_*_EL1 > > > sysreg such as MPIDR_EL1, CTR_EL1, SMIDR_EL1. > > > > > > It would allow the VMM to actively write zeroes to any writable ID > > > register it doesn't know about, or for which it doesn't have anything > > > to restore. It is also relatively future proof, as it covers > > > *everything* the architecture has provisioned for the future (by the > > > time that space is exhausted, I hope none of us will still be involved > > > with this crap). > > > > Famous last words :) > > > > But yes, that should work. This wouldn't be the first ioctl returning a > > long list, and the VMM would just call it once on VM startup to figure > > things out anyway. > > To be clear, see below for what I had in mind. It is of course > untested, and is probably overlooking a number of details, but you'll > hopefully get my drift. I think this has some benefit over the > per-sysreg ioctl, as it covers everything in one go, and is guaranteed > to be exhaustive (until the architecture grows another range of ID > crap). > > Note that we don't necessarily need to restrict ourselves to a single > range either. We could also return some other ranges depending on > additional parameters (Oliver mentioned offline the case of the > PCMEIDx_EL0 registers). > > Thank, > > M. > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > index 2ca2973abe66..fa79f3651423 100644 > --- a/arch/arm64/kvm/sys_regs.c > +++ b/arch/arm64/kvm/sys_regs.c > @@ -3589,3 +3589,91 @@ int __init kvm_sys_reg_table_init(void) > > return 0; > } > + > +/* > + * From DDI0487J.a, D19.2.66 ("ID_AA64MMFR2_EL1, AArch64 Memory Model > + * Feature Register 2"): > + * > + * "The Feature ID space is defined as the System register space in > + * AArch64 with op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, > + * op2=={0-7}." > + * > + * This covers all R/O registers that indicate anything useful feature > + * wise, including the ID registers. > + */ > + > +/* Userspace-visible definitions */ > +#define ARM64_FEATURE_ID_SPACE_SIZE (3 * 8 * 8) > +#define __ARM64_FEATURE_ID_SPACE_IDX(op0, op1, crn, crm, op2) \ > + ({ \ > + __u64 __op1 = op1 & 3; \ > + __op1 -= (__op1 == 3); \ > + ((ARM64_SYS_REG_SHIFT_MASK(3, OP0) | \ > + ARM64_SYS_REG_SHIFT_MASK(__op1, OP1) | \ > + ARM64_SYS_REG_SHIFT_MASK(0, CRN) | \ > + ARM64_SYS_REG_SHIFT_MASK(crm & 7, CRM) | \ > + ARM64_SYS_REG_SHIFT_MASK(op2, OP2)) - \ > + (ARM64_SYS_REG_SHIFT_MASK(3, OP0) | \ > + ARM64_SYS_REG_SHIFT_MASK(0, OP1) | \ > + ARM64_SYS_REG_SHIFT_MASK(0, CRN) | \ > + ARM64_SYS_REG_SHIFT_MASK(0, CRM) | \ > + ARM64_SYS_REG_SHIFT_MASK(0, OP2))); \ > + }) > + > +#define ARM64_FEATURE_ID_SPACE_INDEX(r) \ > + __ARM64_FEATURE_ID_SPACE_IDX(sys_reg_Op0(r), \ > + sys_reg_Op1(r), \ > + sys_reg_CRn(r), \ > + sys_reg_CRm(r), \ > + sys_reg_Op2(r)) > + > +struct feature_id_writeable_masks { > + u64 mask[ARM64_FEATURE_ID_SPACE_SIZE]; > +}; > + > +static bool is_feature_id_reg(u32 encoding) > +{ > + return (sys_reg_Op0(encoding) == 3 && > + (sys_reg_Op1(encoding) < 2 || sys_reg_Op1(encoding) == 3) && > + sys_reg_CRn(encoding) == 0 && > + sys_reg_CRm(encoding) <= 7); > +} > + > +int kvm_get_writeable_feature_regs(struct kvm *kvm, u64 __user *masks) > +{ > + /* Wipe the whole thing first */ > + for (int i = 0; i < ARM64_FEATURE_ID_SPACE_SIZE; i++) > + if (put_user(0, masks + i)) > + return -EFAULT; > + > + for (int i = 0; i < ARRAY_SIZE(sys_reg_descs); i++) { > + const struct sys_reg_desc *reg = &sys_reg_descs[i]; > + u32 encoding = reg_to_encoding(reg); > + u64 val; > + > + if (!is_feature_id_reg(encoding) || !reg->set_user) > + continue; > + > + /* > + * For ID registers, we return the writable mask. > + * Other feature registers return a full 64bit mask. > + * That's not necessarily compliant with a given > + * revision of the architecture, but the RES0/RES1 > + * definitions allow us to do that > + */ > + if (is_id_reg(encoding)) { > + if (!reg->val) > + continue; > + > + val = reg->val; > + } else { > + val = ~0UL; > + } > + > + if (put_user(val, > + (masks + ARM64_FEATURE_ID_SPACE_INDEX(encoding)))) > + return -EFAULT; > + } > + > + return 0; > +} Thanks Marc. The whole idea is clear to me now. I'll implement this in the next version. Jing > > > > -- > Without deviation from the norm, progress is not possible.