On Mon, Sep 18 2023 at 21:02, Jason Gunthorpe wrote: > On Tue, Sep 19, 2023 at 01:47:37AM +0200, Thomas Gleixner wrote: >> >> > The intree way to alter the MSI configuration is via >> >> > sriov_set_msix_vec_count, and there is only one in-tree driver that >> >> > uses it right now. >> >> >> >> Right, but that only addresses the driver specific issues. >> > >> > Sort of.. sriov_vf_msix_count_store() is intended to be the entry >> > point for this and if the kernel grows places that cache the value or >> > something then this function should flush those caches too. >> >> Sorry. What I wanted to say is that the driver callback is not the right >> place to reload the MSI domains after the change. > > Oh, that isn't even what Shannon's patch does, it patched VFIO's main > PCI driver - not a sriov_set_msix_vec_count() callback :( Shannon's > scenario doesn't even use sriov_vf_msix_count_store() at all - the AMD > device just randomly changes its MSI count whenever it likes. Ooops. When real hardware changes things behind the kernels back we consider it a hardware bug. The same applies to virtualization muck. So all we should do is add some code which yells when the "hardware" plays silly buggers. >> > I suppose flushing happens implicitly because Shannon reports that >> > things work fine if the driver is rebound. Since >> > sriov_vf_msix_count_store() ensures there is no driver bound before >> > proceeding it probe/unprobe must be flushing out everything? >> >> Correct. So sriov_set_msix_vec_count() could just do: >> >> ret = pdev->driver->sriov_set_msix_vec_count(vf_dev, val); >> if (!ret) >> teardown_msi_domain(pdev); >> >> Right? > > It subtly isn't needed, sriov_vf_msix_count_store() already requires > no driver is associated with the device and this: > > int msi_setup_device_data(struct device *dev) > { > struct msi_device_data *md; > int ret, i; > > if (dev->msi.data) > return 0; > > md = devres_alloc(msi_device_data_release, sizeof(*md), GFP_KERNEL); > if (!md) > return -ENOMEM; > > Already ensured that msi_remove_device_irq_domain() was called via > msi_device_data_release() triggering as part of the devm shutdown of > the bound driver. Indeed. > So, the intree mechanism to change the MSI vector size works. The > crazy mechanism where the device just changes its value without > synchronizing to the OS does not. > > I don't think we need to try and fix that.. We might want to detect it and yell about it, right? Thanks, tglx