Orit, First, thank you for this work, it's very interesting. I tried the patchset but met some problem. It's a kernel panic in L2 guest, and L1 & L0 remains operable: BUG: unable to handle kernel paging request at 0104b00d IP: [<c0105282>] math_state_restore+0xe/0x2f *pde = 00000000 For my environment, the L1 hypervisor is 32 bit KVM (kernel version is 2.6.25), the complete serial.log of L2 is attached. Do you know how I can get over this hang? Thanks, Qing On Mon, 2009-08-17 at 21:48 +0800, oritw@xxxxxxxxxx wrote: > From: Orit Wasserman <oritw@xxxxxxxxxx> > > This patch implements nested VMX support. It enables a guest to use the > VMX APIs in order to run its own nested guest (i.e., it enables > running other hypervisors which use VMX under KVM). The current patch > supports running Linux under a nested KVM. Additional patches for > running Windows under nested KVM, and Linux and Windows under nested > VMware server(!), are currently running in the lab. We are in the > process of forward-porting those patches to -tip. > > The current patch only supports a single nested hypervisor, which can > only run a single guest. SMP is not supported yet when running nested > hypervisor (work in progress). Only 64 bit nested hypervisors are > supported. Currently only EPT mode in both host and nested hypervisor > is supported (i.e., both hypervisors must use EPT). > > This patch was written by: > Orit Wasserman, oritw@xxxxxxxxxx > Ben-Ami Yassour, benami@xxxxxxxxxx > Abel Gordon, abelg@xxxxxxxxxx > Muli Ben-Yehuda, muli@xxxxxxxxxx > > With contributions by > Anthony Liguori, aliguori@xxxxxxxxxx > Mike Day, mdday@xxxxxxxxxx > > This work was inspired by the nested SVM support by Alexander Graf and > Joerg Roedel. > > Signed-off-by: Orit Wasserman <oritw@xxxxxxxxxx>
Linux version 2.6.25 (root@xxxxxxxxxxxxxxxxxxxxx) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Wed Mar 18 13:12:03 CST 2009 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 0000000031ff0000 (usable) BIOS-e820: 0000000031ff0000 - 0000000032000000 (ACPI data) BIOS-e820: 00000000fffbd000 - 0000000100000000 (reserved) 0MB HIGHMEM available. 799MB LOWMEM available. Scan SMP from c0000000 for 1024 bytes. Scan SMP from c009fc00 for 1024 bytes. Scan SMP from c00f0000 for 65536 bytes. found SMP MP-table at [c00fb540] 000fb540 Zone PFN ranges: DMA 0 -> 4096 Normal 4096 -> 204784 HighMem 204784 -> 204784 Movable zone start PFN for each node early_node_map[1] active PFN ranges 0: 0 -> 204784 DMI 2.4 present. Using APIC driver default ACPI: RSDP 000FB660, 0014 (r0 QEMU ) ACPI: RSDT 31FF0000, 002C (r1 QEMU QEMURSDT 1 QEMU 1) ACPI: FACP 31FF002C, 0074 (r1 QEMU QEMUFACP 1 QEMU 1) ACPI: DSDT 31FF0100, 24A4 (r1 BXPC BXDSDT 1 INTL 20061109) ACPI: FACS 31FF00C0, 0040 ACPI: APIC 31FF25A8, 00E0 (r1 QEMU QEMUAPIC 1 QEMU 1) ACPI: PM-Timer IO Port: 0xb008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 6:2 APIC version 20 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] disabled) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] disabled) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] disabled) ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] disabled) ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] disabled) ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] disabled) ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] disabled) ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] disabled) ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] disabled) ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] disabled) ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 40000000 (gap: 32000000:cdfbd000) Built 1 zonelists in Zone order, mobility grouping on. Total pages: 203185 Kernel command line: ro root=LABEL=/ rhgb console=ttyS0 console=tty0 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Initializing CPU#0 PID hash table entries: 4096 (order: 12, 16384 bytes) Detected 3200.275 MHz processor. Console: colour VGA+ 80x25 console [tty0] enabled console [ttyS0] enabled Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 803312k/819136k available (3148k kernel code, 15252k reserved, 1651k data, 284k init, 0k highmem) virtual kernel memory layout: fixmap : 0xffe14000 - 0xfffff000 (1964 kB) pkmap : 0xff800000 - 0xffc00000 (4096 kB) vmalloc : 0xf2800000 - 0xff7fe000 ( 207 MB) lowmem : 0xc0000000 - 0xf1ff0000 ( 799 MB) .init : 0xc05b9000 - 0xc0600000 ( 284 kB) .data : 0xc0413196 - 0xc05b007c (1651 kB) .text : 0xc0100000 - 0xc0413196 (3148 kB) Checking if this processor honours the WP bit even in supervisor mode...Ok. SLUB: Genslabs=12, HWalign=64, Order=0-1, MinObjects=4, CPUs=1, Nodes=1 Calibrating delay using timer specific routine.. 6474.79 BogoMIPS (lpj=12949587) Mount-cache hash table entries: 512 CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 2048K Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. Compat vDSO mapped to ffffe000. BUG: unable to handle kernel paging request at 0104b00d IP: [<c0105282>] math_state_restore+0xe/0x2f *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: Pid: 0, comm: swapper Not tainted (2.6.25 #1) EIP: 0060:[<c0105282>] EFLAGS: 00010286 CPU: 0 EIP is at math_state_restore+0xe/0x2f EAX: 80050033 EBX: 0104b000 ECX: 00000000 EDX: 000000d8 ESI: c05ae000 EDI: c164a818 EBP: 00000020 ESP: c05affdc DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 0, ti=c05ae000 task=c054f3a0 task.ti=c05b2000) Stack: 00000000 c05b0000 c01050e1 00000000 00000000 000000d8 c05b0000 c164a818 00000020 Call Trace: [<c01050e1>] device_not_available+0x2d/0x32 ======================= Code: 95 d4 00 00 00 83 c4 14 5b 5e 5f 5d c3 52 50 68 f3 37 52 c0 e8 18 8d 01 00 83 c4 0c c3 56 53 89 e6 81 e6 00 e0 ff ff 8b 1e 0f 06 <f6> 43 0d 20 75 07 89 d8 e8 5d 36 00 00 90 dd a3 70 02 00 00 83 EIP: [<c0105282>] math_state_restore+0xe/0x2f SS:ESP 0068:c05affdc ---[ end trace ca143223eefdc828 ]--- Kernel panic - not syncing: Attempted to kill the idle task!