On Tue, Aug 24, 2021, Maxim Levitsky wrote: > On Tue, 2021-08-24 at 16:42 +0200, Vitaly Kuznetsov wrote: > > Eduardo Habkost <ehabkost@xxxxxxxxxx> writes: > > > > > On Tue, Aug 24, 2021 at 3:13 AM Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: > > > > Eduardo Habkost <ehabkost@xxxxxxxxxx> writes: > > > > > > > > > On Mon, Aug 23, 2021 at 04:30:28PM +0200, Vitaly Kuznetsov wrote: > > > > > > diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c > > > > > > index ff005fe738a4..92cd4b02e9ba 100644 > > > > > > --- a/arch/x86/kvm/ioapic.c > > > > > > +++ b/arch/x86/kvm/ioapic.c > > > > > > @@ -319,7 +319,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val) > > > > > > unsigned index; > > > > > > bool mask_before, mask_after; > > > > > > union kvm_ioapic_redirect_entry *e; > > > > > > - unsigned long vcpu_bitmap; > > > > > > + unsigned long vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)]; The preferred pattern is: DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS); > > > > > > > > > > Is there a way to avoid this KVM_MAX_VCPUS-sized variable on the > > > > > stack? This might hit us back when we increase KVM_MAX_VCPUS to > > > > > a few thousand VCPUs (I was planning to submit a patch for that > > > > > soon). > > > > > > > > What's the short- or mid-term target? > > > > > > Short term target is 2048 (which was already tested). Mid-term target > > > (not tested yet) is 4096, maybe 8192. > > > > > > > Note, we're allocating KVM_MAX_VCPUS bits (not bytes!) here, this means > > > > that for e.g. 2048 vCPUs we need 256 bytes of the stack only. In case > > > > the target much higher than that, we will need to either switch to > > > > dynamic allocation or e.g. use pre-allocated per-CPU variables and make > > > > this a preempt-disabled region. I, however, would like to understand if > > > > the problem with allocating this from stack is real or not first. > > > > > > Is 256 bytes too much here, or would that be OK? > > > > > > > AFAIR, on x86_64 stack size (both reqular and irq) is 16k, eating 256 Don't forget i386! :-) > > bytes of it is probably OK. I'd start worrying when we go to 1024 (8k > > vCPUs) and above (but this is subjective of course). 256 is fine, 1024 would indeed be problematic, e.g. CONFIG_FRAME_WARN defaults to 1024 on 32-bit kernels. That's not a hard limit per se, but ideally KVM will stay warn-free on all flavors of x86. > On the topic of enlarging these bitmaps to cover all vCPUs. > > I also share the worry of having the whole bitmap on kernel stack for very > large number of vcpus. > Maybe we need to abstract and use a bitmap for a sane number of vcpus, > and use otherwise a 'kmalloc'ed buffer? That's a future problem. More specifically, it's the problem of whoever wants to push KVM_MAX_VCPUS > ~2048. There are a lot of ways to solve the problem, e.g. this I/O APIC code runs under a spinlock so a dedicated bitmap in struct kvm_ioapic could be used to avoid a large stack allocation. > Also in theory large bitmaps might affect performance a bit. Maybe. The only possible degredation for small VMs, i.e. VMs that don't need the full bitmap, is if the compiler puts other variables below the bitmap and causes sub-optimal cache line usage. But I suspect/hope the compiler is smart enough to use GPRs and/or organize the local variables on the stack so that doesn't happen.