Re: [PATCH V2] virt: Prevent IV reuse in SNP guest driver

Peter Gonda <pgonda@xxxxxxxxxx> · Thu, 27 Oct 2022 09:06:40 -0600

On Fri, Oct 21, 2022 at 3:27 PM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
>
> On 10/21/22 15:57, Peter Gonda wrote:
> > On Fri, Oct 21, 2022 at 1:02 PM Tom Lendacky <thomas.lendacky@xxxxxxx> wrote:
> >> On 10/21/22 12:33, Peter Gonda wrote:
> >>> The ASP and an SNP guest use a series of AES-GCM keys called VMPCKs to
> >>> communicate securely with each other. The IV to this scheme is a
> >>> sequence number that both the ASP and the guest track. Currently this
> >>> sequence number in a guest request must exactly match the sequence
> >>> number tracked by the ASP. This means that if the guest sees an error
> >>> from the host during a request it can only retry that exact request or
> >>> disable the VMPCK to prevent an IV reuse. AES-GCM cannot tolerate IV
> >>> reuse see:
> >>> https://csrc.nist.gov/csrc/media/projects/block-cipher-techniques/documents/bcm/comments/800-38-series-drafts/gcm/joux_comments.pdf
> >>>
> >>> To handle userspace querying the cert_data length. Instead of requesting
> >>> the cert length from userspace use the size of the drivers allocated
> >>> shared buffer. Then copy that buffer to userspace, or give userspace an
> >>> error depending on the size of the buffer given by userspace.
> >>>
> >>> Fixes: fce96cf044308 ("virt: Add SEV-SNP guest driver")
> >>> Signed-off-by: Peter Gonda <pgonda@xxxxxxxxxx>
> >>> Reported-by: Peter Gonda <pgonda@xxxxxxxxxx>
> >>> Reviewed-by: Dionna Glaze <dionnaglaze@xxxxxxxxxx>
> >>> Cc: Borislav Petkov <bp@xxxxxxx>
> >>> Cc: Tom Lendacky <thomas.lendacky@xxxxxxx>
> >>> Cc: Michael Roth <michael.roth@xxxxxxx>
> >>> Cc: Haowen Bai <baihaowen@xxxxxxxxx>
> >>> Cc: Yang Yingliang <yangyingliang@xxxxxxxxxx>
> >>> Cc: Marc Orr <marcorr@xxxxxxxxxx>
> >>> Cc: David Rientjes <rientjes@xxxxxxxxxx>
> >>> Cc: Ashish Kalra <Ashish.Kalra@xxxxxxx>
> >>> Cc: linux-kernel@xxxxxxxxxxxxxxx
> >>> Cc: kvm@xxxxxxxxxxxxxxx
> >>> ---
>
> >>> @@ -477,25 +496,37 @@ static int get_ext_report(struct snp_guest_dev *snp_dev, struct snp_guest_reques
> >>>        if (!resp)
> >>>                return -ENOMEM;
> >>>
> >>> -     snp_dev->input.data_npages = npages;
> >>> +     snp_dev->input.data_npages = sizeof(*snp_dev->certs_data) >> PAGE_SHIFT;
> >>>        ret = handle_guest_request(snp_dev, SVM_VMGEXIT_EXT_GUEST_REQUEST, arg->msg_version,
> >>>                                   SNP_MSG_REPORT_REQ, &req.data,
> >>>                                   sizeof(req.data), resp->data, resp_len, &arg->fw_err);
> >>>
> >>> +     resp_cert_len = snp_dev->input.data_npages << PAGE_SHIFT;
> >>
> >> The hypervisor is not required to update the number of pages that the
> >> certificates actually used/required if enough pages were supplied. So this
> >> value could always remain as 4 (based on SEV_FW_BLOB_MAX_SIZE) on
> >> successful return.
> >>
> >> And if that's the case, we could always just return 4 for the number of
> >> pages no matter what. Otherwise you'll have to update the logic here if
> >> you want to obtain the actual number.
> >
> > Are you asking for this to just hard code the userspace requirement to
> > 4 pages? We could leave this as written here, that would leave the
> > guest open to a new GHCB spec where
>
> It's up to you. Ideally, if userspace provides a npages value of 0, then
> the driver issues the request with 0 and gets back the actual value. Then,
> to ensure the sequence numbers are updated, you issue the request again
> with the either the just returned value or SEV_FW_BLOB_MAX_SIZE >>
> PAGE_SHIFT. That will update the sequence numbers and the driver returns
> the number of pages required as returned from the first request.
>
> That number can also be cached and then whenever userspace calls down with
> npages of 0, immediately return the cached value. If the request ever gets
> a SNP_GUEST_REQ_INVALID_LEN with the cached value, the newly returned
> value gets cached and the driver performs the request again, like the very
> first time in order to update the sequence numbers. The driver would then
> return the new npages value back to userspace. Of course that becomes the
> "minimum" number of pages now, so even if the hypervisor reduces the size
> of the certs data, it always requires more than needed.
>
> >
> > "State from Hypervisor: on error will contain the number of guest
> > contiguous pages required to hold the data to be returned"
> >
> > Is instead:
> >
> > "State from Hypervisor: contain the number of guest contiguous pages
> > required to hold the data to be returned"
>
> If the driver always returns 4, I don't see this as requiring any change
> to the spec. It may be more than is truly needed, but that doesn't matter,
> the cert data will fit, it just may be more than necessary. This can occur
> even if you pass back the actual number, if, in between the call with 0,
> the hypervisor updates the certs such that less pages are required.
>
> >
> > I think this would be a non-breaking change since the spec says
> > nothing of the non-error case currently. Fine with your preference
> > here. Either Dionna or I can follow up with a series to allow more
> > than 4pages if needed.
>
> I'd prefer that userspace get the actual number, but really, I don't think
> it's a big deal to just return 4 until the driver can handle a more
> dynamic approach should more than 4 pages ever be needed (this would also
> require support on the hypervisor where currently not more than 4 pages
> are allowed to be provided, too).
>
> I just wanted you to be aware that in testing you're likely to see 4
> always returned to userspace.

So if you want userspace to get the actual number I think we want the
host to tell us the actual number. Currently userspace only gets a
upper bound since these requests race against host changes:

0. Host updates cert_data to be 10pages
1. Guest requests SNP EXT REQ for len query
2. Host returns 10 pages
3. Host updates cert_data to be 2pages
4. Guest requests SNP EXT REQ with 10page buffer
5. Host returns certs

I have sent a V3 series. I left this patch largely the same but added
a second patch to fix up the extended request retrying.

>
> >
> > The logic required would be parsing the GUID table? I think we'd
> > rather keep that out of the kernel driver, right?
>
> No, that's not the logic I'm thinking of. I'm just thinking of using the
> userspace specified npages and acting accordingly.
>
> Thanks,
> Tom
>
> >