Hello, While running an AIM7 (workfile.high_systime) in a single 40-way (or a single 60-way KVM guest) I noticed pretty bad performance when the guest was booted with 3.3.1 kernel when compared to the same guest booted with 2.6.32-220 (RHEL6.2) kernel. 'am still trying to dig more into the details here. Wondering if some changes in the upstream kernel (i.e. since 2.6.32-220) might be causing this to show up in a guest environment (esp. for this system-intensive workload). Has anyone else observed this kind of behavior ? Is it a known issue with a fix in the pipeline ? If not are there any special knobs/tunables that one needs to explicitly set/clear etc. when using newer kernels like 3.3.1 in a guest ? I have included some info. below. Also any pointers on what else I could capture that would be helpful. Thanks! Vinod --- Platform used: DL980 G7 (80 cores + 128G RAM). Hyper-threading is turned off. Workload used: AIM7 (workfile.high_systime) and using RAM disks. This is primarily a cpu intensive workload...not much i/o. Software used : qemu-system-x86_64 : 1.0.50 (i.e. latest as of about a week or so ago). Native/Host OS : 3.3.1 (SLUB allocator explicitly enabled) Guest-RunA OS : 2.6.32-220 (i.e. RHEL6.2 kernel) Guest-RunB OS : 3.3.1 Guest was pinned on : numa node: 4,5,6,7 -> 40VCPUs + 64G (i.e. 40-way guest) numa node: 2,3,4,5,7 -> 60VCPUs + 96G (i.e. 60-way guest) For the 40-way Guest-RunA (2.6.32-220 kernel) performed nearly 9x better than the Guest-RunB (3.3.1 kernel). In the case of 60-way guest run the older guest kernel was nearly 12x better ! For the Guest-RunB (3.3.1) case I ran "mpstat -P ALL 1" on the host and observed that a very high % of time was being spent by the CPUs outside the guest mode and mostly in the host (i.e. sys). Looking at the "perf" related traces it seemed like there were long pauses in the guest perhaps waiting for the zone->lru_lock as part of release_pages() and this resulted in the VT's PLE related code to kick-in on the host. Turned on function tracing and found that there appears to be more time being spent around the lock code in the 3.3.1 guest when compared to the 2.6.32-220 guest. Here is a small sampling of these traces... Notice the time stamp jump around "_spin_lock_irqsave <-release_pages" in the case of Guest-RunB. 1) 40-way Guest-RunA (2.6.32-220 kernel): ----------------------------------------- # TASK-PID CPU# |||| TIMESTAMP FUNCTION <...>-32147 [020] 145783.127452: native_flush_tlb <-flush_tlb_mm <...>-32147 [020] 145783.127452: free_pages_and_swap_cache <- unmap_region <...>-32147 [020] 145783.127452: lru_add_drain <- free_pages_and_swap_cache <...>-32147 [020] 145783.127452: release_pages <- free_pages_and_swap_cache <...>-32147 [020] 145783.127452: _spin_lock_irqsave <-release_pages <...>-32147 [020] 145783.127452: __mod_zone_page_state <- release_pages <...>-32147 [020] 145783.127452: mem_cgroup_del_lru_list <- release_pages ... <...>-32147 [022] 145783.133536: release_pages <- free_pages_and_swap_cache <...>-32147 [022] 145783.133536: _spin_lock_irqsave <-release_pages <...>-32147 [022] 145783.133536: __mod_zone_page_state <- release_pages <...>-32147 [022] 145783.133536: mem_cgroup_del_lru_list <- release_pages <...>-32147 [022] 145783.133537: lookup_page_cgroup <- mem_cgroup_del_lru_list 2) 40-way Guest-RunB (3.3.1): ----------------------------- # TASK-PID CPU# |||| TIMESTAMP FUNCTION <...>-16459 [009] .... 101757.383125: free_pages_and_swap_cache <- tlb_flush_mmu <...>-16459 [009] .... 101757.383125: lru_add_drain <- free_pages_and_swap_cache <...>-16459 [009] .... 101757.383125: release_pages <- free_pages_and_swap_cache <...>-16459 [009] .... 101757.383125: _raw_spin_lock_irqsave <- release_pages <...>-16459 [009] d... 101757.384861: mem_cgroup_lru_del_list <- release_pages <...>-16459 [009] d... 101757.384861: lookup_page_cgroup <- mem_cgroup_lru_del_list .... <...>-16459 [009] .N.. 101757.390385: release_pages <- free_pages_and_swap_cache <...>-16459 [009] .N.. 101757.390385: _raw_spin_lock_irqsave <- release_pages <...>-16459 [009] dN.. 101757.392983: mem_cgroup_lru_del_list <- release_pages <...>-16459 [009] dN.. 101757.392983: lookup_page_cgroup <- mem_cgroup_lru_del_list <...>-16459 [009] dN.. 101757.392983: __mod_zone_page_state <- release_pages -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html