On Sat, May 6, 2023 at 12:08 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > I am fine with these two ways naming of the param. Adjusting from > > enable_msi_remaping to disable_msi_bypass was aimed to trying address > > your comment about dealing with the device not supporting bypass. > > And in vmd drivers, the vmd bypass feature is enabled by adding the flag > > "VMD_FEAT_CAN_BYPASS_MSI_REMAP". Therefore, I think disabling > > bypass seems more appropriate. This patch aims to provide a convenient > > way to remove that flag in a specific case. > > Users don't care about the name of VMD_FEAT_CAN_BYPASS_MSI_REMAP. I > don't think that's a very good name either (in my opinion > "VMD_FEAT_MSI_REMAP_DISABLE" would be more descriptive, and > VMCONFIG_MSI_REMAP is even worse since setting it *disables* MSI > remapping), but in any case these are internal to the driver. > > > On Sat, Apr 29, 2023 at 2:40 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > The "disable_msi_bypass" parameter name also leads to some complicated > > > logic. IIUC, "disable_msi_bypass=0" means "do not disable MSI remap > > > bypassing" or, in other words, "do not remap MSIs." This is only > > > supported by some VMDs. Using "disable_msi_bypass=0" to *enable* the > > > bypass feature is confusing. > > > > However, as you said, it does lead to some complicated logic. So, I > > also believe that these two approaches have their own pros and cons. > > > > > I still don't understand what causes the performance problem here. I > > > guess you see higher performance when the VMD remaps child MSIs? So > > > adding the VMD MSI-X domain interrupt handler and squashing all the > > > child MSI vectors into the VMD MSI vector space makes things better? > > > That seems backwards. Is this because of cache effects or something? > > > > > What does "excessive pressure on the PCIe node" mean? I assume the > > > PCIe node means the VMD? It receives the same number of child > > > interrupts in either case. > > > > What I mean is that there will be performance issues when a PCIe domain > > is fully loaded with 4 Gen4 NVMe devices. like this: > > +-[10002:00]-+-01.0-[01]----00.0 device0 > > | +-03.0-[02]----00.0 device1 > > | +-05.0-[03]----00.0 device2 > > | \-07.0-[04]----00.0 device3 > > > > According to the perf/irqtop tool, we found the os does get the same > > counts of interrupts in different modes. However, when the above > > situation occurs, we observed a significant increase in CPU idle > > time. Besides, the data and performance when using the bypass VMD > > feature were identical to when VMD was disabled. And after the > > devices mounted on a domain are reduced, the IOPS of individual > > devices will rebound somewhat. Therefore, we speculate that VMD can > > play a role in balancing and buffering interrupt loads. Therefore, > > in this situation, we believe that VMD ought to not be bypassed to > > handle MSI. > > The proposed text was: > > Use this when multiple NVMe devices are mounted on the same PCIe > node with a high volume of 4K random I/O. It mitigates excessive > pressure on the PCIe node caused by numerous interrupts from NVMe > drives, resulting in improved I/O performance. Such as: > > The NVMe configuration and workload you mentioned works better with > MSI-X remapping. But I don't know *why*, and I don't know that NVMe > is the only device affected. So it's hard to write useful guidance > for users, other than "sometimes it helps." > > Straw man proposal: > > msi_remap=0 > > Disable VMD MSI-X remapping, which improves interrupt performance > because child device interrupts avoid the VMD MSI-X domain > interrupt handler. Not all VMD devices support this, but for > those that do, this is the default. > > msi_remap=1 > > Remap child MSI-X interrupts into VMD MSI-X interrupts. This > limits the number of MSI-X vectors available to the whole child > device domain to the number of VMD MSI-X interrupts. > > This may be required in some virtualization scenarios. > > This may improve performance in some I/O topologies, e.g., several > NVMe devices doing lots of random I/O, but we don't know why. > > I hate the idea of "we don't know why." If you *do* know why, it > would be much better to outline the mechanism because that would help > users know when to use this. But if we don't know why, admitting that > straight out is better than hand-waving about excessive pressure, etc. > I completely agree with you. I discovered this issue using the bisect method. Based on the observed data and experiments, I drew the above conclusions. We deduced the cause from the observed results. However, I have not yet been able to determine the precise cause of this issue. > The same applies to the virtualization caveat. The text above is not > actionable -- how do users know whether their particular > virtualization scenario requires this option? > I will add a note to make it clear that bypass mode will not work in guest passthrought mode, only remapping mode can be used. Thanks