On Thu, Jan 14, 2021 at 11:24:12AM -0800, Alexander Duyck wrote: > > As you say BAR and MSI vector table are not so different, it seems > > perfectly in line with current PCI sig thinking to allow resizing the > > MSI as well > > The resizing of a BAR has an extended capability that goes with it and > isn't something that the device can just do on a whim. This patch set > is not based on implementing some ECN for resizable MSI-X tables. I > view it as arbitrarily rewriting the table size for a device after it > is created. The only difference is resizing the BAR is backed by an ECN, and this is an extension. The device does not "do it on a whim" the OS tells it when to change the size, exactly like for BAR resizing. > In addition Leon still hasn't answered my question on preventing the > VF driver from altering entries beyond the ones listed in the table. Of course this is blocked, the FW completely revokes the HW resource backing the vectors. > From what I can tell, the mlx5 Linux driver never reads the MSI-X > flags register so it isn't reading the MSI-X size either. I don't know why you say that. All Linux drivers call into something like pci_alloc_irq_vectors() requesting a maximum # of vectors and that call returns the actual allocated. Drivers can request more vectors than the system provides, which is what mlx5 does. Under the call chain of pci_alloc_irq_vectors() it calls pci_msix_vec_count() which does pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control); return msix_table_size(control); And eventually uses that to return the result to the driver. So, yes, it reads the config space and ensures it doesn't allocate more vectors than that. Every driver using MSI does this in Linux. Adjusting config space *directly* limits the number of vectors the driver allocates. You should be able to find the call chain in mlx5 based on the above guidance. > At a minimum I really think we need to go through and have a clear > definition on when updating the MSI-X table size is okay and when it > is not. I am not sure just saying to not update it when a driver isn't > attached is enough to guarantee all that. If you know of a real issue then please state it, other don't fear monger "maybe" issues that don't exist. > What we are talking about is the MSI-X table size. Not the number of > MSI-X vectors being requested by the device driver. Those are normally > two seperate things. Yes, table size is what is critical. The number of entries in that BAR memory is what needs to be controlled. > > The standards based way to communicate that limit is the MSI table cap > > size. > > This only defines the maximum number of entries, not how many have to be used. A driver can't use entries beyond the cap. We are not trying to reclaim vectors that are available but not used by the OS. > I'm not offering up a non-standard way to do this. Just think about > it. If I have a device that defines an MSI-X table size of 2048 but > makes use of only one vector how would that be any different than what > I am suggesting where you size your VF to whatever the maximum is you > need but only make use of some fixed number from the hardware. That is completely different, in the hypervisor there is no idea how many vectors a guest OS will create. The FW is told to only permit 1 vector. How is the guest to know this if we don't update the config space *as the standard requires* ? > I will repeat what I said before. Why can't this be handled via the > vfio interface? 1) The FW has to be told of the limit and everything has to be in sync If the FW is out of sync with the config space then everything breaks if the user makes even a small mistake - for instance forgetting to use the ioctl to override vfio. This is needlessly frail and complicated. 2) VFIO needs to know how to tell the FW the limit so it can override the config space with emulation. This is all device specific and I don't see that adding an extension to vfio is any better than adding one here. 3) VFIO doesn't cover any other driver that binds to the VF, so this "solution" leaves the normal mlx5_core functionally broken on VFs which is a major regression. Overall the entire idea to have the config space not reflect the actual current device configuration seems completely wrong to me - why do this? For what gain? It breaks everything. Jason