I see really slow vmalloc performance on 2.6.35-rc3:
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
3) 3.581 us | vfree();
3) | msr_io() {
3) ! 523.880 us | vmalloc();
3) 1.702 us | vfree();
3) ! 529.960 us | }
3) | msr_io() {
3) ! 564.200 us | vmalloc();
3) 1.429 us | vfree();
3) ! 568.080 us | }
3) | msr_io() {
3) ! 578.560 us | vmalloc();
3) 1.697 us | vfree();
3) ! 584.791 us | }
3) | msr_io() {
3) ! 559.657 us | vmalloc();
3) 1.566 us | vfree();
3) ! 575.948 us | }
3) | msr_io() {
3) ! 536.558 us | vmalloc();
3) 1.553 us | vfree();
3) ! 542.243 us | }
3) | msr_io() {
3) ! 560.086 us | vmalloc();
3) 1.448 us | vfree();
3) ! 569.387 us | }
msr_io() is from arch/x86/kvm/x86.c, allocating at most 4K (yes it
should use kmalloc()). The memory is immediately vfree()ed. There are
96 entries in /proc/vmallocinfo, and the whole thing is single threaded
so there should be no contention.
Here's the perf report:
63.97% qemu
[kernel] [k] rb_next
|
--- rb_next
|
|--70.75%-- alloc_vmap_area
| __get_vm_area_node
| __vmalloc_node
| vmalloc
| |
| |--99.15%-- msr_io
| | kvm_arch_vcpu_ioctl
| | kvm_vcpu_ioctl
| | vfs_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call
| | __GI_ioctl
| | |
| | --100.00%--
0x1dfc4a8878e71362
| |
| --0.85%-- __kvm_set_memory_region
| kvm_set_memory_region
|
kvm_vm_ioctl_set_memory_region
| kvm_vm_ioctl
| vfs_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call
| __GI_ioctl
|
--29.25%-- __get_vm_area_node
__vmalloc_node
vmalloc
|
|--98.89%-- msr_io
| kvm_arch_vcpu_ioctl
| kvm_vcpu_ioctl
| vfs_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call
| __GI_ioctl
| |
| --100.00%--
0x1dfc4a8878e71362
It seems completely wrong - iterating 8 levels of a binary tree
shouldn't take half a millisecond.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html