On Thu, 13 Feb 2020 10:23:21 -0700 Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > On Thu, 13 Feb 2020 12:46:54 +0100 > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > > On Tue, 11 Feb 2020 16:05:42 -0700 > > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > > > > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not > > > fully isolated from the PF. The PF can always cause a denial of > > > service to the VF, if not access data passed through the VF directly. > > > This is why vfio-pci currently does not bind to PFs with SR-IOV enabled > > > and does not provide access itself to enabling SR-IOV on a PF. The > > > IOMMU grouping mechanism might allow us a solution to this lack of > > > isolation, however the deficiency isn't actually in the DMA path, so > > > much as the potential cooperation between PF and VF devices. Also, > > > if we were to force VFs into the same IOMMU group as the PF, we severely > > > limit the utility of having independent drivers managing PFs and VFs > > > with vfio. > > > > > > Therefore we introduce the concept of a VF token. The token is > > > implemented as a UUID and represents a shared secret which must be set > > > by the PF driver and used by the VF drivers in order to access a vfio > > > device file descriptor for the VF. The ioctl to set the VF token will > > > be provided in a later commit, this commit implements the underlying > > > infrastructure. The concept here is to augment the string the user > > > passes to match a device within a group in order to retrieve access to > > > the device descriptor. For example, rather than passing only the PCI > > > device name (ex. "0000:03:00.0") the user would also pass a vf_token > > > UUID (ex. "2ab74924-c335-45f4-9b16-8569e5b08258"). The device match > > > string therefore becomes: > > > > > > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258" > > > > > > This syntax is expected to be extensible to future options as well, with > > > the standard being: > > > > > > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2" > > > > Is this designed to be an AND condition? (I read it as such.) > > Not sure I understand, the device name is always required for > compatibility, then zero or more key/value pairs may be needed > depending on the vfio bus driver and device requirements. I'm not > suggesting that the user would pass multiple vf_token= options and the > implementation here would only parse the first. I'm really only > suggesting the parsing format we'd use for multiple options, I'm not > trying to dictate how a bus driver might make use of them. Does that > make sense, should I change some wording? Not multiple vf_token= options, but multiple, different options. E.g. we have something like "$NAME foo=xyz bar=zyx". What is supposed to happen? - both the foo= and bar= values have to give a match - either foo= or bar= have to match - if foo= doesn't match, try bar= - foo= and bar= are ignored, if not applicable (My understanding is that $NAME matching continues to be mandatory?) What should happen for vf_token= is reasonably clear, but I'm wondering about further extensions, as you already talk about it. > > > > > > > The device name must be first and option=value pairs are separated by > > > spaces. > > > > > > The vf_token option is only required for VFs where the PF device is > > > bound to vfio-pci. There is no change for PFs using existing host > > > drivers. > > > > > > Note that in order to protect existing VF users, not only is it required > > > to set a vf_token on the PF before VFs devices can be accessed, but also > > > if there are existing VF users, (re)opening the PF device must also > > > provide the current vf_token as authentication. This is intended to > > > prevent a VF driver starting with a trusted PF driver and later being > > > replaced by an unknown driver. A vf_token is not required to open the > > > PF device when none of the VF devices are in use by vfio-pci drivers. > > > > > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > --- > > > drivers/vfio/pci/vfio_pci.c | 127 +++++++++++++++++++++++++++++++++++ > > > drivers/vfio/pci/vfio_pci_private.h | 8 ++ > > > 2 files changed, 134 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > > > index 2ec6c31d0ab0..26aea9ac4863 100644 > > > --- a/drivers/vfio/pci/vfio_pci.c > > > +++ b/drivers/vfio/pci/vfio_pci.c > > > @@ -466,6 +466,35 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) > > > vfio_pci_set_power_state(vdev, PCI_D3hot); > > > } > > > > > > +static struct pci_driver vfio_pci_driver; > > > + > > > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val) > > > > Suggestion: call this _user_modify(), and have _user_add() and > > _user_remove() as wrappers. That said, ... > > I did consider something along these lines, but it seems overly > explicit for a helper that's used in two places with only two obvious > and discrete values. If this were an exposed API, absolutely I'd agree. > > > > +{ > > > + struct pci_dev *physfn = pci_physfn(vdev->pdev); > > > + struct vfio_device *pf_dev; > > > + struct vfio_pci_device *pf_vdev; > > > + > > > + if (!vdev->pdev->is_virtfn) > > > + return; > > > + > > > + pf_dev = vfio_device_get_from_dev(&physfn->dev); > > > + if (!pf_dev) > > > + return; > > > + > > > + if (pci_dev_driver(physfn) != &vfio_pci_driver) { > > > + vfio_device_put(pf_dev); > > > + return; > > > + } > > > + > > > + pf_vdev = vfio_device_data(pf_dev); > > > + > > > + mutex_lock(&pf_vdev->vf_token->lock); > > > + pf_vdev->vf_token->users += val; > > > > ...is this expected to always be >= 0? If yes, it looks like a bug if > > this is called with val==-n for n > users. > > I'm not sure if you're inadvertently pointing out the bug in the > vfio_pci_open() path below where we increment token users before > vfio_pci_enable() which can fail, or your suggesting a WARN_ON here > should this go negative. This is a static helper function, so I > generally don't try to sanitize the inputs like I would for an exposed > API. Yes, if you let users drop below 0, it's an internal error. Still, I think it's worth checking, so that we catch those internal errors early on, so a WARN_ON is probably the right thing to do. > > > > + mutex_unlock(&pf_vdev->vf_token->lock); > > > + > > > + vfio_device_put(pf_dev); > > > +} > > > + > > > static void vfio_pci_release(void *device_data) > > > { > > > struct vfio_pci_device *vdev = device_data; > > > @@ -475,6 +504,7 @@ static void vfio_pci_release(void *device_data) > > > if (!(--vdev->refcnt)) { > > > vfio_spapr_pci_eeh_release(vdev->pdev); > > > vfio_pci_disable(vdev); > > > + vfio_pci_vf_token_user_add(vdev, -1); > > > } > > > > > > mutex_unlock(&vdev->reflck->lock); > > > @@ -493,6 +523,7 @@ static int vfio_pci_open(void *device_data) > > > mutex_lock(&vdev->reflck->lock); > > > > > > if (!vdev->refcnt) { > > > + vfio_pci_vf_token_user_add(vdev, 1); > > > ret = vfio_pci_enable(vdev); > > > if (ret) > > > goto error; > > I think we want to include decrementing token users in the error path > here. Yes; good that my comment made you spot it, because I didn't notice :) > > > > @@ -1278,11 +1309,86 @@ static void vfio_pci_request(void *device_data, unsigned int count) > > > mutex_unlock(&vdev->igate); > > > } > > > > > > +#define VF_TOKEN_ARG "vf_token=" > > > + > > > +/* Called with vdev->vf_token->lock */ > > > +static int vfio_pci_vf_token_match(struct vfio_pci_device *vdev, char *opts) > > > +{ > > > + char *token; > > > + uuid_t uuid; > > > + int ret; > > > + > > > + if (!opts) > > > + return -EINVAL; > > > + > > > + token = strstr(opts, VF_TOKEN_ARG); > > > + if (!token) > > > + return -EINVAL; > > > + > > > + token += strlen(VF_TOKEN_ARG); > > > + > > > + ret = uuid_parse(token, &uuid); > > > + if (ret) > > > + return ret; > > > + > > > + if (!uuid_equal(&uuid, &vdev->vf_token->uuid)) > > > + return -EACCES; > > > + > > > + return 0; Again, I guess my objections below are a matter of taste; especially because this function does the key=value parsing, and I'm not sure it's the right place to do so. > > > +} > > > + > > > static int vfio_pci_match(void *device_data, char *buf) > > > { > > > struct vfio_pci_device *vdev = device_data; > > > + char *opts; > > > + > > > + opts = strchr(buf, ' '); > > > + if (opts) { > > > + *opts = 0; > > > + opts++; > > > + } > > > + > > > + if (strcmp(pci_name(vdev->pdev), buf)) > > > + return 0; /* No match */ > > > > Up to here, this function is fine; but below, it gets a bit hard to > > follow... > > > > > + > > > + if (vdev->pdev->is_virtfn) { > > > + struct pci_dev *physfn = pci_physfn(vdev->pdev); > > > + struct vfio_device *pf_dev; > > > + int ret = 0; > > > + > > > + pf_dev = vfio_device_get_from_dev(&physfn->dev); > > > + if (pf_dev) { > > > + if (pci_dev_driver(physfn) == &vfio_pci_driver) { > > > + struct vfio_pci_device *pf_vdev = > > > + vfio_device_data(pf_dev); > > > + > > > + mutex_lock(&pf_vdev->vf_token->lock); > > > + ret = vfio_pci_vf_token_match(pf_vdev, opts); > > > + mutex_unlock(&pf_vdev->vf_token->lock); > > > + } > > > + > > > + vfio_device_put(pf_dev); > > > + > > > + if (ret) > > > + return -EACCES; > > > + } > > > + } > > > > If we are a VF, and the PF is bound to vfio, pass whatever stuff other > > than the device name that was passed in to an opaque match function. > > vfio_pci_match() is an opaque match function relative to vfio.c, but > there's nothing opaque here. We have a VF where the associated PF is > bound to vfio-pci, therefore we require that the additional options > include a vf_token matching the PF and we go looking to verify that. > > > > - return !strcmp(pci_name(vdev->pdev), buf); > > > + if (vdev->vf_token) { > > > + int ret = 0; > > > + > > > + mutex_lock(&vdev->vf_token->lock); > > > + > > > + if (vdev->vf_token->users) > > > + ret = vfio_pci_vf_token_match(vdev, opts); > > > + > > > + mutex_unlock(&vdev->vf_token->lock); > > > + > > > + if (ret) > > > + return -EACCES; > > > + } > > > > If we have a VF token with users, pass whatever stuff other than the > > device name that was passed in to an opaque match function. > > This description strays further off course a bit. If we have a > vf_token then we are a PF and clearly bound to vfio-pci. If there are > existing VF users then we require the user to provide a vf_token > matching the one currently on the device. Maybe my wording is just a bit off... > > > What about the following instead: > > > > - parse the passed-in string into device name and key/value pairs > > - maybe reject anything with an unknown key > > This is definitely something that we should decided whether or not we > want to do it. I think an argument for it is that a user can pick > arbitrary key=value options that would be ignored with this > implementation, but later might match a key that gets defined and then > we break them. Misuse of the API on the part of the user, but maybe > better to just prevent it ahead of time. Yes, it's probably good to do this now. > > > - check the device name > > - if we're a VF with the PF bound to vfio, require a VF token to be > > specified and pass it to a token match function > > - if we have a VF token with users, require a VF token to be specified > > and pass it to a token match function > > This is essentially what we do above. Maybe we just disagree about > whether we're calling an "opaque match function" versus a "token match > function". Maybe I should have said "function parsing a string, which might contain a lot of unrelated stuff" vs "function explicitly handling a vf_token value". > > > > > My main gripes with the current code are: > > - key=value parsing is done in the match function for vf_token > > - it looks hard to extend the list of supported key/value pairs > > - I don't see a good way to find out (as the user) _why_ the VF isn't > > matching > > If we want to reject unknown options, then yes, the parsing should be > done ahead. I don't see that it's hard to extend though, each new > requirement can follow the same methodology to check for it in the > remaining options string. If you pre-parse into option/value pairs, you see quite easily if you managed to obtain a required option, if an option has been specified more than once, or if an unknown option has been specified. If a new option is introduced, you just need to handle whatever has been parsed already. Extending is probably not exactly hard, but pre-parsing likely makes it easier, as you don't have implicit assumptions. > > The last point is the one I brought up in the cover letter and where > I'm also not happy with the opaque error condition, but I have no > thoughts on how to resolve it. Either we block the user from getting > the device file descriptor, and they're left scratching their heads > why, or we give them access to the file descriptor AND we need to > impose barriers to access mechanisms (ex. block read/write/mmap, limit > ioctls) AND we need to use VFIO_DEVICE_FEATURE and VFIO_DEVICE_GET_INFO > as a mechanism for the user to figure out that the device requires > "something" to get full access. With the latter, I'm concerned whether > existing userspace code will fail in predictable ways and that it > snowballs into an ugly API. For instance, if we add a flag to device > info to indicate it's locked, existing code won't know about that flag, > so we have to cripple device info to report no regions and no irqs to > make that code fail. Then a user needs to know which feature to probe > for to figure out how the device is locked, then once they do we make > device info report real values? It's maybe a little more deterministic > than blocking access to the file descriptor, but I'm not sure it's > worth it. We could do a log-once if the match fails for token, but we > need to be careful not to provide an obvious point where the user could > spam the logs. I've also considered if we could write an error back > into the user's buffer, but the ioctl isn't designed that way and we > don't know if we'd break how the user consumes that buffer later. Some extended reporting mechanism is likely to become unwieldy, especially when we realize we missed something. A simple log message that a vf_token is required (pointing to a more verbose description, if possible) looks best (obviously rate limited or only printed once). Just enough to give the user a hint so that they are not left completely baffled. > > Ideas greatly welcomed in this space. Thanks for the review! > > Alex