Sean Christopherson <seanjc@xxxxxxxxxx> writes: > On Tue, Aug 24, 2021, Maxim Levitsky wrote: >> On Tue, 2021-08-24 at 16:42 +0200, Vitaly Kuznetsov wrote: >> > Eduardo Habkost <ehabkost@xxxxxxxxxx> writes: >> > >> > > On Tue, Aug 24, 2021 at 3:13 AM Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: >> > > > Eduardo Habkost <ehabkost@xxxxxxxxxx> writes: >> > > > >> > > > > On Mon, Aug 23, 2021 at 04:30:28PM +0200, Vitaly Kuznetsov wrote: >> > > > > > diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c >> > > > > > index ff005fe738a4..92cd4b02e9ba 100644 >> > > > > > --- a/arch/x86/kvm/ioapic.c >> > > > > > +++ b/arch/x86/kvm/ioapic.c >> > > > > > @@ -319,7 +319,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val) >> > > > > > unsigned index; >> > > > > > bool mask_before, mask_after; >> > > > > > union kvm_ioapic_redirect_entry *e; >> > > > > > - unsigned long vcpu_bitmap; >> > > > > > + unsigned long vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)]; > > The preferred pattern is: > > DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS); > Yes, thanks! >> > > > > >> > > > > Is there a way to avoid this KVM_MAX_VCPUS-sized variable on the >> > > > > stack? This might hit us back when we increase KVM_MAX_VCPUS to >> > > > > a few thousand VCPUs (I was planning to submit a patch for that >> > > > > soon). >> > > > >> > > > What's the short- or mid-term target? >> > > >> > > Short term target is 2048 (which was already tested). Mid-term target >> > > (not tested yet) is 4096, maybe 8192. >> > > >> > > > Note, we're allocating KVM_MAX_VCPUS bits (not bytes!) here, this means >> > > > that for e.g. 2048 vCPUs we need 256 bytes of the stack only. In case >> > > > the target much higher than that, we will need to either switch to >> > > > dynamic allocation or e.g. use pre-allocated per-CPU variables and make >> > > > this a preempt-disabled region. I, however, would like to understand if >> > > > the problem with allocating this from stack is real or not first. >> > > >> > > Is 256 bytes too much here, or would that be OK? >> > > >> > >> > AFAIR, on x86_64 stack size (both reqular and irq) is 16k, eating 256 > > Don't forget i386! :-) > I'm not forgetting, I'm deliberately ignoring its existence :-) Whoever tries to raise KVM_MAX_VCPUS from '288' may limit the change to x86_64, I seriosly doubt 32bit users want to run guests with thouthands of CPUs. >> > bytes of it is probably OK. I'd start worrying when we go to 1024 (8k >> > vCPUs) and above (but this is subjective of course). > > 256 is fine, 1024 would indeed be problematic, e.g. CONFIG_FRAME_WARN defaults to > 1024 on 32-bit kernels. That's not a hard limit per se, but ideally KVM will stay > warn-free on all flavors of x86. Thanks for the CONFIG_FRAME_WARN pointer, I said '1024' out of top of my head but it seems the number wasn't random after all) > >> On the topic of enlarging these bitmaps to cover all vCPUs. >> >> I also share the worry of having the whole bitmap on kernel stack for very >> large number of vcpus. >> Maybe we need to abstract and use a bitmap for a sane number of vcpus, >> and use otherwise a 'kmalloc'ed buffer? > > That's a future problem. More specifically, it's the problem of whoever wants to > push KVM_MAX_VCPUS > ~2048. There are a lot of ways to solve the problem, e.g. > this I/O APIC code runs under a spinlock so a dedicated bitmap in struct kvm_ioapic > could be used to avoid a large stack allocation. +1 > >> Also in theory large bitmaps might affect performance a bit. > > Maybe. The only possible degredation for small VMs, i.e. VMs that don't need the > full bitmap, is if the compiler puts other variables below the bitmap and causes > sub-optimal cache line usage. But I suspect/hope the compiler is smart enough to > use GPRs and/or organize the local variables on the stack so that doesn't happen. > -- Vitaly