On Wed, Jun 05, 2013 at 03:42:57PM -0500, Anthony Liguori wrote: > "Michael S. Tsirkin" <mst@xxxxxxxxxx> writes: > > > On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote: > >> "Michael S. Tsirkin" <mst@xxxxxxxxxx> writes: > >> > >> > On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: > >> >> Look, it's very simple. > >> > We only need to do it if we do a change that breaks guests. > >> > > >> > Please find a guest that is broken by the patches. You won't find any. > >> > >> I think the problem in this whole discussion is that we're talking past > >> each other. > >> > >> Here is my understanding: > >> > >> 1) PCI-e says that you must be able to disable IO bars and still have a > >> functioning device. > >> > >> 2) It says (1) because you must size IO bars to 4096 which means that > >> practically speaking, once you enable a dozen or so PIO bars, you run > >> out of PIO space (16 * 4k == 64k and not all that space can be used). > > > > > > Let me add 3 other issues which I mentioned and you seem to miss: > > > > 3) architectures which don't have fast access to IO ports, exist > > virtio does not work there ATM > > Which architectures have PCI but no IO ports? > > > 4) setups with many PCI bridges exist and have the same issue > > as PCI express. virtio does not work there ATM > > This is not virtio specific. This is true for all devices that use IO. Absolutely. And you will find that modern devices make use of IO ports optional. > > 5) On x86, even with nested page tables, firmware only decodes > > the page address on an invalid PTE, not the data. You need to > > emulate the guest to get at the data. Without > > nested page tables, we have to do page table walk and emulate > > to get both address and data. Since this is how MMIO > > is implemented in kvm on x86, MMIO is much slower than PIO > > (with nested page tables by a factor of >2, did not test without). > > Am well aware of this, this is why we use PIO. > > I fully agree with you that when we do MMIO, we should switch the > notification mechanism to avoid encoding anything meaningful as data. > > >> virtio-pci uses a IO bars exclusively today. Existing guest drivers > >> assume that there is an IO bar that contains the virtio-pci registers. > >> So let's consider the following scenarios: > >> > >> QEMU of today: > >> > >> 1) qemu -drive file=ubuntu-13.04.img,if=virtio > >> > >> This works today. Does adding an MMIO bar at BAR1 break this? > >> Certainly not if the device is behind a PCI bus... > >> > >> But are we going to put devices behind a PCI-e bus by default? Are we > >> going to ask the user to choose whether devices are put behind a legacy > >> bus or the express bus? > >> > >> What happens if we put the device behind a PCI-e bus by default? Well, > >> it can still work. That is, until we do something like this: > >> > >> 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng > >> -device virtio-balloon.. > >> > >> Such that we have more than a dozen or so devices. This works > >> perfectly fine today. It works fine because we've designed virtio to > >> make sure it works fine. Quoting the spec: > >> > >> "Configuration space is generally used for rarely-changing or > >> initialization-time parameters. But it is a limited resource, so it > >> might be better to use a virtqueue to update configuration information > >> (the network device does this for filtering, otherwise the table in the > >> config space could potentially be very large)." > >> > >> In fact, we can have 100s of PCI devices today without running out of IO > >> space because we're so careful about this. > >> > >> So if we switch to using PCI-e by default *and* we keep virtio-pci > >> without modifying the device IDs, then very frequently we are going to > >> break existing guests because the drivers they already have no longer > >> work. > >> > >> A few virtio-serial channels, a few block devices, a couple of network > >> adapters, the balloon and RNG driver, and we hit the IO space limit > >> pretty damn quickly so this is not a contrived scenario at all. I would > >> expect that we frequently run into this if we don't address this problem. > >> > >> So we have a few options: > >> 1) Punt all of this complexity to libvirt et al and watch people make > >> the wrong decisions about when to use PCI-e. This will become yet > >> another example of KVM being too hard to configure. > >> > >> 2) Enable PCI-e by default and just force people to upgrade their > >> drivers. > >> > >> 3) Don't use PCI-e by default but still add BAR1 to virtio-pci > >> > >> 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), > > > > We can't do this - it will hurt performance. > > Can you explain? I thought the whole trick with separating out the > virtqueue notification register was to regain the performance? Yes but this trick only works well with NPT (it's still a bit slower than PIO but not so drastically). Without NPT you still need a page walk so it will be slow. > >> give > >> it a new device/vendor ID. Continue to use virtio-pci for existing > >> devices potentially adding virtio-{net,blk,...}-pcie variants for > >> people that care to use them. > >> > >> I think 1 == 2 == 3 and I view 2 as an ABI breaker. > > > > Why do you think 2 == 3? 2 changes default behaviour. 3 does not. > > It doesn't change the default behavior but then we're pushing the > decision of when to use pci-e to the user. They have to understand that > there can be subtle breakages because the virtio-pci driver may not work > if they are using an old guest. pci-e is a separate issue. E.g. we can make pci-e a new device id. > >> libvirt does like > >> policy so they're going to make a simple decision and always use the > >> same bus by default. I suspect if we made PCI the default, they might > >> just always set the PCI-e flag just because. > > > > This sounds very strange. But let's assume you are right for > > the sake of the argument ... > > > >> There are hundreds of thousands if not millions of guests with existing > >> virtio-pci drivers. Forcing them to upgrade better have an extremely > >> good justification. > >> > >> I think 4 is the best path forward. It's better for users (guests > >> continue to work as they always have). There's less confusion about > >> enabling PCI-e support--you must ask for the virtio-pcie variant and you > >> must have a virtio-pcie driver. It's easy to explain. > > > > I don't think how this changes the situation. libvirt still need > > to set policy and decide which device to use. > > But virtio-pcie never exhausts the IO configuration space. That's the > difference. > > And virtio-pcie is a separate driver so presumably libvirt will make > that visible in the XML. In fact, it should. It should already do that explicit even if device name is the same. In fact, it does: bus name is pcie.0 versus pci.0 > >> It also maps to what regular hardware does. I highly doubt that there > >> are any real PCI cards that made the shift from PCI to PCI-e without > >> bumping at least a revision ID. > > > > Only because the chance it's 100% compatible on the software level is 0. > > It always has some hardware specific quirks. > > No such excuse here. > > > >> It also means we don't need to play games about sometimes enabling IO > >> bars and sometimes not. > > > > This last paragraph is wrong, it ignores the issues 3) to 5) > > I added above. > > > > If you do take them into account: > > - there are reasons to add MMIO BAR to PCI, > > even without PCI express > > So far, the only reason you've provided is "it doesn't work on some > architectures." Which architectures? PowerPC wants this. > > - we won't be able to drop IO BAR from virtio > > An IO BAR is useless if it means we can't have more than 12 devices. It's not useless. A smart BIOS can enable devices one by one as it tries to boot from them. > > > >> Regards, > >> > >> Anthony Liguori > >> > >> > > >> > > >> > -- > >> > MST > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe kvm" in > >> > the body of a message to majordomo@xxxxxxxxxxxxxxx > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html