On Tue, Nov 12, 2024, Dionna Glaze wrote: > @@ -109,6 +110,10 @@ static void *sev_init_ex_buffer; > */ > static struct sev_data_range_list *snp_range_list; > > +/* SEV ASID data tracks resources associated with an ASID to safely manage operations. */ > +struct sev_asid_data *sev_asid_data; > +u32 nr_asids, sev_min_asid, sev_es_max_asid; > + > static inline bool sev_version_greater_or_equal(u8 maj, u8 min) > { > struct sev_device *sev = psp_master->sev_data; > @@ -1093,6 +1098,109 @@ static int snp_filter_reserved_mem_regions(struct resource *rs, void *arg) > return 0; > } > > +static bool sev_check_external_user(int fd); > +void *sev_snp_create_context(int fd, int asid, int *psp_ret) > +{ > + struct sev_data_snp_addr data = {}; > + void *context; > + int rc, error; > + > + if (!sev_check_external_user(fd)) > + return ERR_PTR(-EBADF); > + > + if (!sev_asid_data) > + return ERR_PTR(-ENODEV); > + > + if (asid < 0 || asid >= nr_asids) > + return ERR_PTR(-EINVAL); > + > + /* Can't create a context for a used ASID. */ > + if (WARN_ON_ONCE(sev_asid_data[asid].snp_context)) > + return ERR_PTR(-EBUSY); Tracking contexts in an array that's indexed per ASID is unsafe and unnecessarily splits ASID management across KVM and the PSP driver. There is zero reason the PSP driver needs to care about ASIDs. Attempting to police KVM is futile, and leads to bloated, convoluted code. AFAICT, there is nothing to guard against a use-after-free in snp_update_guest_contexts(). The need to handle SEV_RET_INVALID_GUEST is a pretty big clue that there are races between KVM and firmware updates. if (!sev_asid_data[i].snp_context) continue; status_data.gctx_paddr = __psp_pa(sev_asid_data[i].snp_context); status_data.address = __psp_pa(snp_guest_status); rc = sev_do_cmd(SEV_CMD_SNP_GUEST_STATUS, &status_data, &error); if (!rc) continue; /* * Handle race with SNP VM being destroyed/decommissoned, * if guest context page invalid error is returned, * assume guest has been destroyed. */ if (error == SEV_RET_INVALID_GUEST) continue; Using an array is also inefficient, as it requires iterating over all possible ASIDs, many of which may be unused. Furthermore, handling this in the PSP driver (correctly) leads to unnecessary locking. KVM already protects SNP ASID allocations with sev_deactivate_lock, I see zero reason to complicate things with another lock. The "rollback" mechanism is also broken. If SEV_CMD_SNP_GUEST_STATUS fails, synthetic_restore_required is set and never cleared, and impacts *all* SEV PSP commands. I.e. failure to update one guest context comletely cripples the entire system. Not to mention synthetic_restore_required also lacks any form of SMP synchronication. I also don't see how a rollback is possible if an error occurs after one or more guest contexts have been updated. Presumably trying to rollback in that state will leave the updated guests in a bad state. Of course, I don't see any rollback code as nothing ever clears synthetic_restore_required, so what's intented to happen is entirely unclear. I also don't see anything in this series that explains why a SEV_CMD_SNP_GUEST_STATUS failure shouldn't be treated as a fatal error. Of the error codes listed in the SNP ABI, everything except UPDATE_FAILED is clearly a software bug. And I can't find anything that explains when UPDATE_FAILED will be returned. Table 80. Status Codes for SNP_GUEST_STATUS Status Condition SUCCESS Successful completion. INVALID_PLATFORM_STATE The platform is not in the INIT state. INVALID_ADDRESS The address is invalid for use by the firmware. INVALID_PARAM MBZ fields are not zero. INVALID_GUEST The guest context page was invalid. INVALID_PAGE_STATE The guest status page was not in the correct state. INVALID_PAGE_SIZE The guest status page was not the correct size. UPDATE_FAILED Update of the firmware internal state or a guest context page has failed. Somewhat off the cuff, I think the only sane way to approach this is to call into KVM when doing a firmware update, and let KVM react accordingly. E.g. let KVM walk its list of VMs in order to update SNP VMs, taking kvm_lock and the somewhat misnamed sev_deactivate_lock() as needed. Then if updating a guest context fails, terminate _that_ VM, and move on to the next VM. Side topic, I don't see any code that ensures no SEV or SEV-ES VMs are running. Is the idea to let userspace throw noodles at the PSP and see what sticks? + Provide support for AMD Platform Security Processor firmware. + The PSP firmware can be updated while no SEV or SEV-ES VMs are active. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + Users of this feature should be aware of the error modes that indicate + required manual rollback or reset due to instablity.