On Wed, Oct 30, 2024 at 11:55:01AM -0500, Bjorn Helgaas wrote: > On Wed, Oct 30, 2024 at 12:43:19PM +0100, Michał Winiarski wrote: > > On Mon, Oct 28, 2024 at 11:56:04AM -0500, Bjorn Helgaas wrote: > > > On Fri, Oct 25, 2024 at 11:50:36PM +0200, Michał Winiarski wrote: > > > > VF MMIO resource reservation, either created by system firmware and > > > > inherited by Linux PCI subsystem or created by the subsystem itself, > > > > should contain enough space to fit the BAR of all SR-IOV Virtual > > > > Functions that can potentially be created (total VFs supported by the > > > > device). > > > > > > I don't think "VF resource reservation ... should contain enough > > > space" is really accurate or actionable. It would be *nice* if the PF > > > BAR is large enough to accommodate the largest supported VF BARs for > > > all possible VFs, but if it doesn't, it's not really an error. It's > > > just a reflection of the fact that resource space is limited. > > > > From PCI perspective, you're right, IOV resources are optional, and it's > > not really an error for PF device itself. > > From IOV perspective - we do need those resources to be able to create > > VFs. > > > > All I'm trying to say here, is that the context of the change is the > > "success" case, where the VF BAR reservation was successfully assigned, > > and the PF will be able to create VFs. > > The case where there were not enough resources for VF BAR (and PF won't > > be able to create VFs) remains unchanged. > > > > > > However, that assumption only holds in an environment where VF BAR size > > > > can't be modified. > > > > > > There's no reason to assume anything about how many VF BARs fit. The > > > existing code should avoid enabling the requested nr_virtfn VFs if the > > > PF doesn't have enough space -- I think that's what the "if > > > (res->parent)" is supposed to be checking. > > > > > > The fact that you need a change here makes me suspect that we're > > > missing some resource claim (and corresponding res->parent update) > > > elsewhere when resizing the VF BAR. > > > > My understanding is that res->parent is only expressing that the > > resource is assigned. > > We don't really want to change that, the resource is still there and is > > assigned - we just want to make sure that VF enabling fails if the > > caller wants to enable more VFs than possible for current resource size. > > > > Let's use an example. A device with a single BAR. > > initial_vf_bar_size = X > > total_vfs = 4 > > supported_vf_resizable_bar_sizes = X, 2X, 4X > > In addition, IIUC we're assuming the PF BAR size is 4X, since the > conclusion is that 4 VF BARs of size X fill it completely. The PF BAR is initially sized based on VF BAR size [1]. So yeah - that's the assumption for the initial state (prior to doing any resizing). For VF PCI device BAR 0, it would be PF PCI device resource 7 (and it's programmed using PCI SR-IOV extended capability - so slightly different path then regular PCI BARs). Note that this resource is completely independent from actual PF BAR 0 (resource 0), which is why I'm calling it "underlying resource" or "reservation". [1] https://elixir.bootlin.com/linux/v6.11/source/drivers/pci/iov.c#L807 > > With that - the initial underlying resource looks like this: > > +----------------------+ > > |+--------------------+| > > || || > > |+--------------------+| > > |+--------------------+| > > || || > > |+--------------------+| > > |+--------------------+| > > || || > > |+--------------------+| > > |+--------------------+| > > || || > > |+--------------------+| > > +----------------------+ > > Its size is 4X, and it contains BAR for 4 VFs. > > "resource_size >= vf_bar_size * num_vfs" is true for any num_vfs > > Let's assume that there are enough resources to assign it. > > > > Patch 4/7 allows to resize the entire resource (in addition to changing > > the VF BAR size), which means that after calling: > > pci_resize_resource() with size = 2X, the underlying resource will look > > like this: > > +----------------------+ > > |+--------------------+| > > || || > > || || > > || || > > || || > > |+--------------------+| > > |+--------------------+| > > || || > > || || > > || || > > || || > > |+--------------------+| > > |+--------------------+| > > || || > > || || > > || || > > || || > > |+--------------------+| > > |+--------------------+| > > || || > > || || > > || || > > || || > > |+--------------------+| > > +----------------------+ > > Its size is 8X, and it contains BAR for 4 VFs. > > "resource_size >= vf_bar_size * num_vfs" is true for any num_vfs > > With the assumption that the PF BAR size is 4X, these VFs would no > longer fit. I guess that's basically what you say here: Exactly - it wouldn't fit, unless we resize the underlying resource as well. Which is what the successfull call to pci_resize_resource() with size = 2X, will do. > > It does require an extra 4X of MMIO resources, so this can fail in > > resource constrained environment, even though the original 4X resource > > was able to be assigned. > > The following patch 6/7 allows to change VF BAR size without touching > > the underlying reservation size. > > After calling pci_iov_vf_bar_set_size() to 4X and enabling a single VF, > > the underlying resource will look like this: > > +----------------------+ > > |+--------------------+| > > ||░░░░░░░░░░░░░░░░░░░░|| > > ||░░░░░░░░░░░░░░░░░░░░|| > > ||░░░░░░░░░░░░░░░░░░░░|| > > ||░░░░░░░░░░░░░░░░░░░░|| > > ||░░░░░░░░░░░░░░░░░░░░|| > > ||░░░░░░░░░░░░░░░░░░░░|| > > ||░░░░░░░░░░░░░░░░░░░░|| > > ||░░░░░░░░░░░░░░░░░░░░|| > > ||░░░░░░░░░░░░░░░░░░░░|| > > ||░░░░░░░░░░░░░░░░░░░░|| > > |+--------------------+| > > +----------------------+ > > Its size is 4X, but since pci_iov_vf_bar_set_size() was called, it is no > > longer able to accomodate 4 VFs. > > "resource_size >= vf_bar_size * num_vfs" is only true for num_vfs = 1 > > and any attempts to create more than 1 VF should fail. > > We don't need to worry about being MMIO resource constrained, no extra > > MMIO resources are needed. > > IIUC this series only resizes VF BARs. Those VF BARs are carved out > of a PF BAR, and this series doesn't touch the PF BAR resizing path. > I guess the driver might be able to increase the PF BAR size if > necessary, and then increase the VF BAR size. No - it is possible to resize the PF BAR as well. In addition to adding pci_iov_vf_bar_set_size() / pci_iov_vf_bar_get_size(), the series expands the pci_resize_resource() function, so that it can accept IOV resources (PF BARs 7+, or "underlying resource" for VF BAR). The usage of pci_resize_resource() for IOV resources is the same as for the regular PCI BARs, with the caller expected to release all the resource prior to resizing it (as the bridge windows may need to be reprogrammed). > It sounds like this patch is really a bug fix independent of VF BAR > resizing. If we currently allow enabling more VFs than will fit in a > PF BAR, that sounds like a bug. It's not really a bug fix. Without the ability to control VF BAR size independently of PF BAR size (via pci_iov_vf_bar_set_size() / pci_iov_vf_bar_get_size()), PF BAR size is always tied to the VF BAR size and the total (max) number of VFs supported by the device, so there's no need to check if the "nr_virtfn" will fit - as "nr_virtfn" is guaranteed to be smaller than the total number of VFs supported by the device. > So if we try to enable too many VFs, sriov_enable() should fail. I > still don't see why this check should change the res->parent test, > though. The logic for res->parent isn't really changed - we are still failing VF enabling if the resource is not assigned. Previously we were incrementing resource counter if the resource is assigned: if (res->parent) nres++; Now, we're not incrementing the resource counter if the resource is not assigned or if it is too small to fit all enabled VFs: if (!res->parent || vf_bar_sz * nr_virtfn > resource_size(res)) continue; I can split it into two conditions, if you think it would be clearer: if (vf_bar_sz * nr_virtfn > resource_size(res)) continue; if (res->parent) nres++; -Michał > > > > > Add an additional check that verifies that VF BAR for all enabled VFs > > > > fits within the underlying reservation resource. > > > > > > > > Signed-off-by: Michał Winiarski <michal.winiarski@xxxxxxxxx> > > > > --- > > > > drivers/pci/iov.c | 8 ++++++-- > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > > > > index 79143c1bc7bb4..5de828e5a26ea 100644 > > > > --- a/drivers/pci/iov.c > > > > +++ b/drivers/pci/iov.c > > > > @@ -645,10 +645,14 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn) > > > > > > > > nres = 0; > > > > for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { > > > > + int vf_bar_sz = pci_iov_resource_size(dev, > > > > + pci_resource_to_iov(i)); > > > > bars |= (1 << pci_resource_to_iov(i)); > > > > res = &dev->resource[pci_resource_to_iov(i)]; > > > > - if (res->parent) > > > > - nres++; > > > > + if (!res->parent || vf_bar_sz * nr_virtfn > resource_size(res)) > > > > + continue; > > > > + > > > > + nres++; > > > > } > > > > if (nres != iov->nres) { > > > > pci_err(dev, "not enough MMIO resources for SR-IOV\n"); > > > > -- > > > > 2.47.0 > > > >