Hi Reiji, On Thu, Jan 6, 2022 at 4:28 AM Reiji Watanabe <reijiw@xxxxxxxxxx> wrote: > > This patch lays the groundwork to make ID registers writable. > > Introduce struct id_reg_info for an ID register to manage the > register specific control of its value for the guest, and provide set > of functions commonly used for ID registers to make them writable. > > The id_reg_info is used to do register specific initialization, > validation of the ID register and etc. Not all ID registers must > have the id_reg_info. ID registers that don't have the id_reg_info > are handled in a common way that is applied to all ID registers. > > At present, changing an ID register from userspace is allowed only > if the ID register has the id_reg_info, but that will be changed > by the following patches. > > No ID register has the structure yet and the following patches > will add the id_reg_info for some ID registers. > > Signed-off-by: Reiji Watanabe <reijiw@xxxxxxxxxx> > --- > arch/arm64/kvm/sys_regs.c | 243 ++++++++++++++++++++++++++++++++++++-- > 1 file changed, 230 insertions(+), 13 deletions(-) > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > index 80dc62f98ef0..971018288bee 100644 > --- a/arch/arm64/kvm/sys_regs.c > +++ b/arch/arm64/kvm/sys_regs.c > @@ -265,6 +265,101 @@ static bool trap_raz_wi(struct kvm_vcpu *vcpu, > return read_zero(vcpu, p); > } > > +struct id_reg_info { > + u32 sys_reg; /* Register ID */ Nit: Why not have the comment above, as for the other fields of this struct? > + > + /* > + * Limit value of the register for a vcpu. The value is the sanitized > + * system value with bits cleared for unsupported features for the > + * guest. > + */ I think that rather than saying "with bits cleared for unsupported features", it might be better to rephrase along the lines of "with bits indicating" or "set/cleared to indicate", so that it applies to signed fields as well. > + u64 vcpu_limit_val; > + > + /* Fields that we don't to validate by arm64_check_features. */ Nit: Remove "to". > + u64 ignore_mask; > + > + /* Initialization function of the id_reg_info */ > + void (*init)(struct id_reg_info *id_reg); > + /* > + * This is an optional ID register specific validation function. > + * When userspace tries to set the ID register, arm64_check_features() > + * will check if the requested value indicates any features that cannot > + * be supported by KVM on the host. But, some ID register fields need > + * a special checking, and this function can be used for such fields. > + * e.g. When SVE is configured for a vCPU by KVM_ARM_VCPU_INIT, > + * ID_AA64PFR0_EL1.SVE shouldn't be set to 0 for the vCPU. > + * The validation function for ID_AA64PFR0_EL1 could be used to check > + * the field is consistent with SVE configuration. > + */ > + int (*validate)(struct kvm_vcpu *vcpu, const struct id_reg_info *id_reg, > + u64 val); > + > + /* > + * Return a bitmask of the vCPU's ID register fields that are not > + * synced with saved (per VM) ID register value, which usually > + * indicates opt-in CPU features that is not configured for the vCPU. Nit: s/is/are > + * ID registers are saved per VM, but some opt-in CPU features can > + * be configured per vCPU. The saved (per VM) values for such > + * features are for vCPUs with the features (and zero for > + * vCPUs without the features). > + * Return value of this function is used to handle such fields > + * for per vCPU ID register read/write request with saved per VM > + * ID register. See the __write_id_reg's comment for more detail. > + */ > + u64 (*vcpu_mask)(const struct kvm_vcpu *vcpu, > + const struct id_reg_info *id_reg); > +}; > + > +static void id_reg_info_init(struct id_reg_info *id_reg) > +{ > + id_reg->vcpu_limit_val = read_sanitised_ftr_reg(id_reg->sys_reg); > + if (id_reg->init) If there is an id_reg then the init function is necessary, isn't it? Otherwise it doesn't seem to add more than the default handling. If that's right, should we instead ensure that init is always set? > + id_reg->init(id_reg); > +} > + > +/* > + * An ID register that needs special handling to control the value for the > + * guest must have its own id_reg_info in id_reg_info_table. > + * (i.e. the reset value is different from the host's sanitized value, > + * the value is affected by opt-in features, some fields need specific > + * validation, etc.) > + */ > +#define GET_ID_REG_INFO(id) (id_reg_info_table[IDREG_IDX(id)]) > +static struct id_reg_info *id_reg_info_table[KVM_ARM_ID_REG_MAX_NUM] = {}; > + > +static int validate_id_reg(struct kvm_vcpu *vcpu, u32 id, u64 val) > +{ > + const struct id_reg_info *id_reg = GET_ID_REG_INFO(id); > + u64 limit, tmp_val; > + int err; > + > + if (id_reg) { > + limit = id_reg->vcpu_limit_val; > + /* > + * Replace the fields that are indicated in ignore_mask with > + * the value in the limit to not have arm64_check_features() > + * check the field in @val. > + */ > + tmp_val = val & ~id_reg->ignore_mask; > + tmp_val |= (limit & id_reg->ignore_mask); > + } else { > + limit = read_sanitised_ftr_reg(id); > + tmp_val = val; > + } > + > + /* Check if the value indicates any feature that is not in the limit. */ > + err = arm64_check_features(id, tmp_val, limit); > + if (err) > + return err; > + > + if (id_reg && id_reg->validate) > + /* Run the ID register specific validity check. */ > + err = id_reg->validate(vcpu, id_reg, val); > + > + return err; > +} > + > /* > * ARMv8.1 mandates at least a trivial LORegion implementation, where all the > * RW registers are RES0 (which we can implement as RAZ/WI). On an ARMv8.0 > @@ -1061,9 +1156,81 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu, > return true; > } > > +static u64 read_kvm_id_reg(struct kvm *kvm, u32 id) > +{ > + return kvm->arch.id_regs[IDREG_IDX(id)]; > +} > + > +static int modify_kvm_id_reg(struct kvm *kvm, u32 id, u64 val, > + u64 preserve_mask) > +{ > + u64 old, new; > + > + mutex_lock(&kvm->lock); > + > + old = kvm->arch.id_regs[IDREG_IDX(id)]; > + > + /* Preserve the value at the bit position set in preserve_mask */ > + new = old & preserve_mask; > + new |= (val & ~preserve_mask); > + > + /* Don't allow to modify ID register value after KVM_RUN on any vCPUs */ > + if (kvm_vm_has_started(kvm) && new != old) > + return -EBUSY; This path doesn't release the lock. I see that it gets fixed in a future patch, but it might be good to fix it here... > + > + WRITE_ONCE(kvm->arch.id_regs[IDREG_IDX(id)], new); > + mutex_unlock(&kvm->lock); > + > + return 0; > +} > + > +static int write_kvm_id_reg(struct kvm *kvm, u32 id, u64 val) > +{ > + return modify_kvm_id_reg(kvm, id, val, 0); > +} > + > + > +/* > + * KVM basically forces all vCPUs of the guest to have a uniform value for > + * each ID register (, which means KVM_SET_ONE_REG for a vCPU affects all > + * the vCPUs of the guest), and the id_regs[] of kvm_arch holds values > + * of ID registers for the guest. However, there is an exception for > + * ID register fields corresponding to CPU features that can be > + * configured per vCPU by KVM_ARM_VCPU_INIT, or etc (e.g. PMUv3, SVE, etc). > + * For such fields, all vCPUs that have the feature will have a non-zero > + * uniform value (, which can be updated by userspace), but the vCPUs that Nit: uneven nesting of parentheses and commas :) > + * don't have the feature will have zero for the fields. > + * Values that @id_regs holds are for vCPUs that have such features. So, > + * to get the ID register value for a vCPU that doesn't have those features, > + * the corresponding fields in id_regs[] needs to be cleared. > + * A bitmask of the fields are provided by id_reg_info's vcpu_mask(), and > + * __write_id_reg() and __read_id_reg() take care of those fields using > + * the bitmask. > + */ > +static int __write_id_reg(struct kvm_vcpu *vcpu, u32 id, u64 val) > +{ > + const struct id_reg_info *id_reg = GET_ID_REG_INFO(id); > + u64 mask = 0; > + > + if (id_reg && id_reg->vcpu_mask) > + mask = id_reg->vcpu_mask(vcpu, id_reg); > + > + /* > + * Update the ID register for the guest with @val, except for fields > + * that are set in the mask, which indicates fields for opt-in > + * features that are not configured for the vCPU. > + */ > + return modify_kvm_id_reg(vcpu->kvm, id, val, mask); > +} > + > static u64 __read_id_reg(const struct kvm_vcpu *vcpu, u32 id) > { > - u64 val = vcpu->kvm->arch.id_regs[IDREG_IDX(id)]; > + const struct id_reg_info *id_reg = GET_ID_REG_INFO(id); > + u64 val = read_kvm_id_reg(vcpu->kvm, id); > + > + if (id_reg && id_reg->vcpu_mask) > + /* Clear fields for opt-in features that are not configured. */ > + val &= ~(id_reg->vcpu_mask(vcpu, id_reg)); > switch (id) { > case SYS_ID_AA64PFR0_EL1: > @@ -1222,12 +1389,7 @@ static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu, > return 0; > } > > -/* > - * cpufeature ID register user accessors > - * > - * For now, these registers are immutable for userspace, so for set_id_reg() > - * we don't allow the effective value to be changed. > - */ > +/* cpufeature ID register user accessors */ > static int __get_id_reg(const struct kvm_vcpu *vcpu, > const struct sys_reg_desc *rd, void __user *uaddr, > bool raz) > @@ -1238,11 +1400,31 @@ static int __get_id_reg(const struct kvm_vcpu *vcpu, > return reg_to_user(uaddr, &val, id); > } > > +/* > + * Check if the given id indicates AArch32 ID register encoding. > + */ > +static bool is_aarch32_id_reg(u32 id) > +{ > + u32 crm, op2; > + > + if (!is_id_reg(id)) > + return false; > + > + crm = sys_reg_CRm(id); > + op2 = sys_reg_Op2(id); > + if (crm == 1 || crm == 2 || (crm == 3 && (op2 != 3 && op2 != 7))) Consistent with the Arm ARM "Table D12-2 System instruction encodings for non-Debug System register accesses" > + /* AArch32 ID register */ > + return true; > + > + return false; > +} > + > static int __set_id_reg(struct kvm_vcpu *vcpu, > const struct sys_reg_desc *rd, void __user *uaddr, > bool raz) > { > const u64 id = sys_reg_to_index(rd); > + u32 encoding = reg_to_encoding(rd); > int err; > u64 val; > > @@ -1250,11 +1432,28 @@ static int __set_id_reg(struct kvm_vcpu *vcpu, > if (err) > return err; > > - /* This is what we mean by invariant: you can't change it. */ > - if (val != read_id_reg(vcpu, rd, raz)) > + if (val == read_id_reg(vcpu, rd, raz)) > + /* The value is same as the current value. Nothing to do. */ > + return 0; > + > + /* > + * Don't allow to modify the register's value if the register is raz, > + * or the reg doesn't have the id_reg_info. > + */ > + if (raz || !GET_ID_REG_INFO(encoding)) > return -EINVAL; > > - return 0; > + /* > + * Skip the validation of AArch32 ID registers if the system doesn't > + * 32bit EL0 (their value are UNKNOWN). > + */ > + if (system_supports_32bit_el0() || !is_aarch32_id_reg(encoding)) { > + err = validate_id_reg(vcpu, encoding, val); > + if (err) > + return err; > + } > + > + return __write_id_reg(vcpu, encoding, val); > } > > static int get_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, > @@ -2816,6 +3015,20 @@ int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices) > return write_demux_regids(uindices); > } > > +static void id_reg_info_init_all(void) > +{ > + int i; > + struct id_reg_info *id_reg; > + > + for (i = 0; i < ARRAY_SIZE(id_reg_info_table); i++) { > + id_reg = (struct id_reg_info *)id_reg_info_table[i]; > + if (!id_reg) > + continue; > + > + id_reg_info_init(id_reg); > + } > +} > + > void kvm_sys_reg_table_init(void) > { > unsigned int i; > @@ -2850,6 +3063,8 @@ void kvm_sys_reg_table_init(void) > break; > /* Clear all higher bits. */ > cache_levels &= (1 << (i*3))-1; > + > + id_reg_info_init_all(); > } > > /* > @@ -2862,11 +3077,12 @@ void set_default_id_regs(struct kvm *kvm) > u32 id; > const struct sys_reg_desc *rd; > u64 val; > + struct id_reg_info *idr; > > for (i = 0; i < ARRAY_SIZE(sys_reg_descs); i++) { > rd = &sys_reg_descs[i]; > if (rd->access != access_id_reg) > - /* Not ID register, or hidden/reserved ID register */ > + /* Not ID register or hidden/reserved ID register */ > continue; > > id = reg_to_encoding(rd); > @@ -2874,7 +3090,8 @@ void set_default_id_regs(struct kvm *kvm) > /* Shouldn't happen */ > continue; > > - val = read_sanitised_ftr_reg(id); > - kvm->arch.id_regs[IDREG_IDX(id)] = val; > + idr = GET_ID_REG_INFO(id); > + val = idr ? idr->vcpu_limit_val : read_sanitised_ftr_reg(id); > + (void)write_kvm_id_reg(kvm, id, val); Rather than ignoring the return value of write_kvm_id_reg(), wouldn't it be better if set_default_id_regs were to propagate it back to kvm_arch_init_vm in case there's a problem? Thanks, /fuad > } > } > -- > 2.34.1.448.ga2b2bfdf31-goog > > _______________________________________________ > kvmarm mailing list > kvmarm@xxxxxxxxxxxxxxxxxxxxx > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm