On Thu, 17 Mar 2022 11:13:44 +0000 David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > On Thu, 2022-03-17 at 10:05 +0100, Igor Mammedov wrote: > > re-sending reply as something went wrong with headers (I suspect Daniel's name formatting) > > and email got bounced back. > > > > On Wed, 16 Mar 2022 14:31:33 +0000 > > David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > > > > > On Wed, 2022-03-16 at 12:28 +0100, Igor Mammedov wrote: > > > > Generally Daniel is right, as long as it's something that what real hardware > > > > supports. (usually it's job if upper layers which know what guest OS is used, > > > > and can tweak config based on that knowledge). > > > > > > > > But it's virt only extension and none (tested with > > > > Windows (hangs on boot), > > > > Linux (brings up only first 255 cpus) > > > > ) of mainline OSes ended up up working as expected (i.e. user asked for this > > > > many CPUs but can't really use them as expected). > > > > > > As I said, that kind of failure mode will happen even with the split > > > irq chip and EXT_DEST_ID, with Windows and older (pre-5.10) Linux > > > kernels. > > > > > > For older guests it would also happen on real hardware, and in VMs > > > where you expose an IOMMU with interrupt remapping. Some guests don't > > > support interrupt remapping, or don't support X2APIC at all. > > > > > > > Which would just lead to users reporting (obscure) bugs. > > > > > > It's not virt only. This could happen with real hardware. > > > > I was talking about EXT_DEST_ID kvm extension. > > Then I'm confused, because that isn't the conversation we were having > before. And in that case what you say above about Linux only bringing > up the first 255 CPUs directly contradicts what you say below, my own > experience, and the whole *point* of the EXT_DEST_ID extension :) Now I'm lost in translation too :) > Let's start again. You observed that Linux guests fail to bring up >254 > vcPUs if qemu doesn't enable the EXT_DEST_ID support, and your fix was > to require EXT_DEST_ID (as a side-effect of requiring split irqchip). > > This reminded me of the fixes I'd been posting since 2020 which still > haven't been merged, so I dusted those off and resent them. > > I didn't incorporate your change, and objected to your patch because I > think it's pointless babysitting. 1) > Yes, in the general case if you want > your guest to use more than 254 vCPUs you need to take a moment to > think about precisely what your guest operating system requires in > order to support that. > > At the very least it needs X2APIC support, and then you need *one* of: > > • EXT_DEST_ID, > • Interrupt remapping, or > • just using those vCPUs without external interrupts. > > Both of the first two require the split irqchip, so your patch just > doesn't let users rely on that last option. I conceded (cited below) > because I don't know of any existing guest OS which does use that last > option. I'd attempted to make Linux do so, but eventually abandoned it: > https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/irqaffinity Given that is was abandoned, it's unlikely that the last option was ever used (and before EXT_DEST_ID, qemu was requiring irq remapping to start guest with so many vcpus). If there will be a user for it in the future we can relax restriction given that it will be properly documented/ user gets sane error/warning messages, so they could figure out what to do. > But now you seem to be making a different argument? > > > > Anyway, as far as I'm concerned it doesn't matter very much whether we > > > insist on the split irq chip or not. Feel free to repost your patch > > > rebased on top of my fixes, which are also in my tree at > > > https://git.infradead.org/users/dwmw2/qemu.git > > > > > > > > > The check you're modifying has moved to x86_cpus_init(). > > > > if we are to keep iommu dependency then moving to x86_cpus_init() > > isn't an option, it should be done at pc_machine_done() time. > > Thus far, I didn't think anyone had been talking about a dependency on > IOMMU. That doesn't make any sense at all. EXT_DEST_ID is perfectly > sufficient for Linux kernels from 5.10 onwards and they don't need the > IOMMU. IOMMU was required before EXT_DEST_ID due to irq-remapping dependency, and that conservative config worked fine for both Linux and Windows guests. That's why I've raised question if we should revert restriction to the way it was back then. With Linux pre-5.10 guests, dmesg output at least complains about IRQ remapping, so user has a small chance to be able to figure out that IOMMU should be configured to get all CPUs working. For post-5.10, all one gets is "bad cpu" without any clue as to why, if EXT_DEST_ID is not advertised by hypervisor. It would be better if guest kernel printed some error/warning in that case. If we start with IOMMU, (Win/Linux) guests boot fine (modulo ancient ones) (irq-remapping is 'on' by default since qemu-4.0). > So no. Post your patch to s/kvm_irqchip_in_kernel/kvm_irqchip_is_split/ > in x86_cpus_init() by all means; I don't care much about that and I've > even incorporated that change into my tree at > https://git.infradead.org/users/dwmw2/qemu.git so that if I do have to > repost these fixes yet again, it'll be included. Looks fine to me, thanks. > But let's not re-add the IOMMU dependency. That would be wrong. We should document possible options, somewhere in QEMU. So not only few would know about what options to use and when. Something along lines above [1].