Re: Enable more than 255 VCPU support without irq remapping function in the guest

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2016-04-28 03:11, Yang Zhang wrote:
> On 2016/4/27 17:45, Jan Kiszka wrote:
>> On 2016-04-27 11:39, Yang Zhang wrote:
>>> On 2016/4/27 13:24, Jan Kiszka wrote:
>>>> On 2016-04-27 06:10, Yang Zhang wrote:
>>>>> On 2016/4/27 0:49, Radim Krčmář wrote:
>>>>>> 2016-04-26 18:17+0200, Jan Kiszka:
>>>>>>> On 2016-04-26 18:14, Lan, Tianyu wrote:
>>>>>>>> Hi All:
>>>>>>>>
>>>>>>>> Recently I am working on extending max vcpu to more than 256 on the
>>>>>>>> both
>>>>>>>> KVM/Xen. For some HPC cases, it needs many vcpus. The job
>>>>>>>> requires to
>>>>>>>> use X2APIC in the guest which supports 32-bit APIC id. Linux kernel
>>>>>>>> requires irq remapping function during enabling X2APIC when max
>>>>>>>> APIC id
>>>>>>>> is more than 255(More detail please see try_to_enable_x2apic()).
>>>>>>
>>>>>> Our of curiosity, how many VCPUs are you aiming at?
>>>>>>
>>>>>>>> The irq remapping function helps to deliver irq to cpu 255~. IOAPIC
>>>>>>>> just
>>>>>>>> supports 8-bit target APIC id field and only can deliver irq to
>>>>>>>> cpu 0~255.
>>>>>>>>
>>>>>>>> So far both KVM/Xen doesn't enable irq remapping function. If
>>>>>>>> enable
>>>>>>>> the
>>>>>>>> function, it seems a huge job which need to rework IO-APIC, local
>>>>>>>> APIC,
>>>>>>>> MSI parts and add virtual VTD support in the KVM.
>>>>>>>>
>>>>>>>> Other quick way to enable more than 256 VCPUs is to eliminate the
>>>>>>>> dependency between irq remapping and X2APIC in the guest linux
>>>>>>>> kernel.
>>>>>>>> So far I can boot the guest after removing the dependency.
>>>>>>>> The side effect I thought is that irq only can deliver to 0~255
>>>>>>>> vcpus
>>>>>>>> but 256 vcpus seem enough to balance irq requests in the guest. In
>>>>>>>> the
>>>>>>>> most cases, there are fewer devices in the guest.
>>>>>>>>
>>>>>>>> I wonder whether it's feasible. There maybe some other side
>>>>>>>> effects I
>>>>>>>> didn't think of. Very appreciate for your comments.
>>>>>>>
>>>>>>> Radim is working on the KVM side already, Peter is currently
>>>>>>> driving the
>>>>>>> VT-d interrupt emulation topic in QEMU. It's in reach, I would
>>>>>>> say. :)
>>>>>>
>>>>>> + Igor extends QEMU to support more than 255 in internal
>>>>>> structures and
>>>>>> ACPI.  What remains mostly untracked is Seabios/OVMF.
>>>>>
>>>>> If we don't want the interrupt from internal device delivers to CPU
>>>>>> 255, do we still need the VT-d interrupt remapping emulation? I think
>>>>> firmware is able to send IPI to wakeup APs even without IR and OS is
>>>>> able to do it too. So basically, only KVM and Qemu's support is
>>>>> enough.
>>>>
>>>> What are "internal devices" for you? And which OS do you know that
>>>> would
>>>> handle such artificial setups without prio massive patching?
>>>
>>> Sorry, a typo. I mean the external devices of IOAPIC/MSI/MSIX. Doesn't
>>> current Linux use x2apic without IR in VM?
>>
>> If and only if there only need to be 254 CPUs to be addressed.
>>
>>>
>>>>
>>>> We do need VT-d IR emulation in order to present our guest a well
>>>> specified and support architecture for running > 255 CPUs.
>>>
>>> I mean in Tianyu's case, if he doesn't care about to deliver external
>>> interrupt to CPU >255, IR is not required.
>>
>> What matters is the guest OS. See my other reply on this why this
>> doesn't work, even for Linux.
> 
> Since there only few devices in his case, set the irq affinity manually
> is enough.

Ah, wait - are we talking about emulating the Xeon Phi architecture in
QEMU, accelerated by KVM?

Then maybe you can point to a more detailed description of its interrupt
architecture than that rather vague "Xeon Phi Coprocessor System
Software Developers Guide" I was just looking at provides. While the Phi
may not have VT-d internally, it still has a need to translate incoming
MSI/MSI-X messages (via that PEG port?) to something that can address
more than 255 APIC IDs, no?

Possibly, you only need an extended KVM kernel interface for the Phi
that allows injecting APIC interrupts to more than 255 CPUs. That
interface has to be designed anyway, for normal x86 systems, and is what
Radim was talking about.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux