On Thu, Oct 20, 2022 at 8:02 AM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote: > > On 10/19/22 16:47, Peter Gonda wrote: > > On Wed, Oct 19, 2022 at 2:58 PM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote: > >> On 10/19/22 15:39, Peter Gonda wrote: > >>> On Wed, Oct 19, 2022 at 1:56 PM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote: > >>>> On 10/19/22 14:17, Dionna Amalie Glaze wrote: > >>>>> On Wed, Oct 19, 2022 at 11:44 AM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote: > >>>>>> On 10/19/22 12:40, Peter Gonda wrote: > >>>>>>> On Wed, Oct 19, 2022 at 11:03 AM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote: > >>>>>>>> On 10/19/22 10:03, Peter Gonda wrote: > >>>>>>>>> The ASP and an SNP guest use a series of AES-GCM keys called VMPCKs to > >>>>>>>>> communicate securely with each other. The IV to this scheme is a > >>>>>>>>> sequence number that both the ASP and the guest track. Currently this > >>>>>>>>> sequence number in a guest request must exactly match the sequence > >>>>>>>>> number tracked by the ASP. This means that if the guest sees an error > >>>>>>>>> from the host during a request it can only retry that exact request or > >>>>>>>>> disable the VMPCK to prevent an IV reuse. AES-GCM cannot tolerate IV > >>>>>>>>> reuse see: > >>>>>>>>> https://csrc.nist.gov/csrc/media/projects/block-cipher-techniques/documents/bcm/comments/800-38-series-drafts/gcm/joux_comments.pdf > > >>> OK so the guest retires with the same request when it gets an > >>> SNP_GUEST_REQ_INVALID_LEN error. It expands its internal buffer to > >> > >> It would just use the pre-allocated snp_dev->certs_data buffer with npages > >> set to the full size of that buffer. > > > > Actually we allocate that buffer with size SEV_FW_BLOB_MAX_SIZE. Maybe > > we want to just allocate this buffer which we think is sufficient and > > never increase the allocation? > > > > I see the size of > > https://developer.amd.com/wp-content/resources/ask_ark_milan.cert is > > 3200 bytes. Assuming the VCEK cert is the same size (which it should > > be since this .cert is 2 certificates). 16K seems to leave enough room > > even for some vendor certificates? > > I think just using the 16K buffer (4 pages) as it is allocated today is > ok. If we get a SNP_GUEST_REQ_INVALID_LEN error that is larger than 4 > pages, then we won't ever be able to pull the certs given how the driver > is coded today. In that case, disabling the VMPCK is in order. > > A separate patch could be submitted later to improve this overall aspect > of the certs buffer if needed. If that sounds OK I'd prefer that. This keeps the drivers current limit: static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_request_ioctl *arg) ... if (req.certs_len > SEV_FW_BLOB_MAX_SIZE || !IS_ALIGNED(req.certs_len, PAGE_SIZE)) return -EINVAL; I'd prefer not to add extra features during the bug fix. But happy to make this work with buffers greater than SEV_FW_BLOB_MAX_SIZE as follow up if you want. > > Thanks, > Tom > > > > >> > >>> hold the certificates. When it finally gets a successful request w/ > >>> certs. Do we want to return the attestation bits to userspace, but > >>> leave out the certificate data. Or just error out the ioctl > >>> completely? > >> > >> We need to be able to return the attestation bits that came back with the > >> extra certs. So just error out of the ioctl with the length error and let > >> user-space retry with the recommended number of pages. > > > > That sounded simpler to me. Will do. > > > >> > >>> > >>> I can do that in this series. > >> > >> Thanks! > >> > >>> > >>>> > >>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> For the rate-limiting patch series [1], the rate-limiting will have to be > >>>>>>>> performed within the kernel, while the mutex is held, and then retry the > >>>>>>>> exact request again. Otherwise, that error will require disabling the > >>>>>>>> VMPCK. Either that, or the hypervisor must provide the rate limiting. > >>>>>>>> > >>>>>>>> Thoughts? > >>>>>>>> > >>>>>>>> [1] https://lore.kernel.org/lkml/20221013160040.2858732-1-dionnaglaze@xxxxxxxxxx/ > >>>>>>> > >>>>>>> Yes I think if the host rate limits the guest. The guest kernel should > >>>>>>> retry the exact message. Which mutex are you referring too? > >>>>>> > >>>>>> Or the host waits and then submits the request and the guest kernel > >>>>>> doesn't have to do anything. The mutex I'm referring to is the > >>>>>> snp_cmd_mutex that is taken in snp_guest_ioctl(). > >>>>> > >>>>> I think that either the host kernel or guest kernel waiting can lead > >>>>> to unacceptable delays. > >>>>> I would recommend that we add a zero argument ioctl to /dev/sev-guest > >>>>> specifically for retrying the last request. > >>>>> > >>>>> We can know what the last request is due to the sev_cmd_mutex serialization. > >>>>> The driver will just keep a scratch buffer for this. Any other request > >>>>> that comes in without resolving the retry will get an -EBUSY error > >>>>> code. > >>>> > >>>> And the first caller will have received an -EAGAIN in order to > >>>> differentiate between the two situations? > >>>> > >>>>> > >>>>> Calling the retry ioctl without a pending command will result in -EINVAL. > >>>>> > >>>>> Let me know what you think. > >>>> > >>>> I think that sounds reasonable, but there are some catches. You will need > >>>> to ensure that the caller that is supposed to retry does actually retry > >>>> and that a caller that does retry is the same caller that was told to retry. > >>> > >>> Whats the issue with the guest driver taking some time? > >>> > >>> This sounds complex because there may be many users of the driver. How > >>> do multiple users coordinate when they need to use the retry ioctl? > >>> > >>>> > >>>> Thanks, > >>>> Tom > >>>> > >>>>>> > >>>>>> Thanks, > >>>>>> Tom > >>>>> > >>>>> > >>>>>