Re: [PATCH] KVM/x86: Increase max vcpu number to 352

Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> · Fri, 18 Aug 2017 10:20:42 -0400

On Wed, Aug 16, 2017 at 11:07:55AM +0800, Lan Tianyu wrote:
> On 2017年08月15日 22:10, Konrad Rzeszutek Wilk wrote:
> > On Tue, Aug 15, 2017 at 11:00:04AM +0800, Lan Tianyu wrote:
> >> On 2017年08月12日 03:35, Konrad Rzeszutek Wilk wrote:
> >>> On Fri, Aug 11, 2017 at 03:00:20PM +0200, Radim Krčmář wrote:
> >>>> 2017-08-11 10:11+0200, David Hildenbrand:
> >>>>> On 11.08.2017 09:49, Lan Tianyu wrote:
> >>>>>> Hi Konrad:
> >>>>>> 	Thanks for your review.
> >>>>>>
> >>>>>> On 2017年08月11日 01:50, Konrad Rzeszutek Wilk wrote:
> >>>>>>> On Thu, Aug 10, 2017 at 06:00:59PM +0800, Lan Tianyu wrote:
> >>>>>>>> Intel Xeon phi chip will support 352 logical threads. For HPC usage
> >>>>>>>> case, it will create a huge VM with vcpu number as same as host cpus. This
> >>>>>>>> patch is to increase max vcpu number to 352.
> >>>>>>>
> >>>>>>> Why not 1024 or 4096?
> >>>>>>
> >>>>>> This is on demand. We can set a higher number since KVM already has
> >>>>>> x2apic and vIOMMU interrupt remapping support.
> >>>>>>
> >>>>>>>
> >>>>>>> Are there any issues with increasing the value from 288 to 352 right now?
> >>>>>>
> >>>>>> No found.
> >>>>
> >>>> Yeah, the only issue until around 2^20 (when we reach the maximum of
> >>>> logical x2APIC addressing) should be the size of per-VM arrays when only
> >>>> few VCPUs are going to be used.
> >>>
> >>> Migration with 352 CPUs all being busy dirtying memory and also poking
> >>> at various I/O ports (say all of them dirtying the VGA) is no problem?
> >>
> >> This depends on what kind of workload is running during migration. I
> >> think this may affect service down time since there maybe a lot of dirty
> >> memory data to transfer after stopping vcpus. This also depends on how
> >> user sets "migrate_set_downtime" for qemu. But I think increasing vcpus
> >> will break migration function.
> > 
> > OK, so let me take a step back.
> > 
> > I see this nice 'supported' CPU count that is exposed in kvm module.
> > 
> > Then there is QEMU throwing out a warning if you crank up the CPU count
> > above that number.
> > 
> > Red Hat's web-pages talk about CPU count as well.
> > 
> > And I am assuming all of those are around what has been tested and
> > what has shown to work. And one of those test-cases surely must
> > be migration.
> > 
> 
> Sorry. This is a typo. I originally meant increasing vcpu shouldn't
> break migration function and just affect service downtime. If there was
> such issue, we should fix it.
> 
> 
> > Ergo, if the vCPU count increase will break migration, then it is
> > a regression.
> > 
> > Or a fix/work needs to be done to support a higher CPU count for
> > migrating?
> > 
> > 
> > Is my understanding incorrect?
> 
> You are right.
> 
> > 
> >>
> >>>
> >>>
> >>>>
> >>>>>>> Also perhaps this should be made in an Kconfig entry?
> >>>>>>
> >>>>>> That will be anther option but I find different platforms will define
> >>>>>> different MAX_VCPU. If we introduce a generic Kconfig entry, different
> >>>>>> platforms should have different range.
> >>>
> >>>
> >>> By different platforms you mean q35 vs the older one, and such?
> >>
> >> I meant x86, arm, sparc and other vendors' code define different max
> >> vcpu number.
> > 
> > Right, and?
> 
> If we introduce a general kconfig of max vcpus for all vendors, it
> should have different max vcpu range for different vendor.

Sounds sensible as well. But based on this thread it seems that the
issue of what is 'supported' vs what is in the code is completely
at odds of each other.

Meaning you may as well go forth and put in a huge amount and it
would be OK with the maintainers?

> 
> 
> 
> 
> -- 
> Best regards
> Tianyu Lan