[PATCH] some optimizations for Virtual Machines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Roland (and utrace-devel community),

	I have just completed, together with Andrea Gasparini, a first 
implementation of a kernel module based on utrace as a fast support for our 
virtualization environment (view-os/umview). The name of the module is "kmview" 
kernel-mode-view-os, and the user level tool will have the same name.
We will (GPL) release both module and user level program as soon as possible.

utrace is a wondeful and well designed tool. However, IMHO, during the
implementation of kmview we have found that there are some
improvements that can be done (and that we have already implemented)
for a better support of virtual machines.

Here are some comments, I hope you'll share our ideas and you'll insert 
our improvements soon in utrace's mainstream code.

1- Order of callbacks

You say: Engines are called in the order they attached.
It is meaningful for kernel generated events but unfortunately it
does not provide a significant semantics for engine nesting when
applied to report_syscall_entry.

When dealing with several tracing/virtual machine tools the 
report_syscall_entry callbacks must be evaluated in the reverse way.

As an example I tried to use strace on a view-os like virtual machine 
(some syscalls get virtualized).
strace works but being the last engine it shows the modified calls, but the
return values of the original calls.

Wrong order:
syscall enter: call -> VM (modification) -> strace -> kernel
syscall exit: call -> VM (restore) -> strace -> kernel

Right order:
syscall enter: call -> strace -> VM (modification) -> kernel
syscall exit: call -> VM (restore) -> strace -> kernel

Reversing the attached engine list traversal for syscall_entry solves the

2- Access to traced process vm.

Your interface provides the call utrace_access_process_vm: it allows tracer 
processes to use /dev/*/mem.
Unfortunately write access is denied (as stated in fs/proc/base.c):
> #define mem_write NULL
> #ifndef mem_write
> /* This is a security hazard */
The /dev/*/mem way to access process vm's would be useless anyway.
When I write a virtual machine support for hundreds of processes I cannot
keep hundreds of open files. On the other hand I cannot open and close file
for each memory access: we need fast access!

I propose a new call:
int utrace_access_process_vm(struct task_struct *tsk, unsigned long addr, char __user *ubuf, int len, int write, int string);
which give I-O access to the memory of the process.
It has about the same interface of access_process_vm (mm/memory.c) with
the extra "string" option (significative only when write==0).
Sometimes a read buffer can be significantly larger than the actual field
used for a string. If string==1 the transfer terminates at '\0' avoiding
the memory error that could arise for unallocated memory after the string 
(and a slight increase in performance).
Prior to give access to the process vm, utrace_access_process_vm check
the rights to do so using utrace_allow_access_process_vm (it has the same
degree of protection of your access to /dev/*/mem).

3- In the patch I have also implemented the support for PTRACE_MULTI 
These two extra features provide:
-- PTRACE_MULTI: multiple PTRACE operation using one call, including
data transfer of chunks of memory and registers.
(it would speed up many commands, have a look of "strace strace ls",
to see how many bursts of prace could collapse!).
I designed this call for virtual machine support.
-- PTRACE_SYSVM: can be used instead of PTRACE_SYSCALL or SYSEMU.
At the end of the pre-syscall protocol it is possible to choose among three
different behavior:
   i- call againg after the syscall (maybe some parameters gets modified by
	    the virtualization. (like PTRACE_SYSCALL)
   ii- skip the upcall after the syscall but do perform the syscall 
	    (for a non virtualized call)
   iii- skip both the system call and the second upcall event
	    (for a completely virtualized call).
PTRACE_SYSVM almost half the number of context switches for Virtual Machines.
(SYSEMU works just for total Virtual Machines, while SYSVM works also
for partial Virtual Machines)
There is a extensive description of SYSVM in some messages I sent some 
time ago on KDML. We already implemented these features on vanilla kernel, 
this verstion based on utrace is architecture independent.

THe complete patch is here:
Unfortunately it is against 2.4.22. I have a very slow connection to the
Internet here, I'll try to update the patch to the latest kernel as
soon as I return home.

Renzo Davoli				| Dept. of Computer Science
(NIC rd235, HAM IZ4DJE)                 | University of Bologna	
Tel. +39 051 2094501			| Mura Anteo Zamboni, 7
Fax. +39 051 2094510			| I-40127 Bologna  ITALY
Key fingerprint = A019 17E2 5562 06F6 77BB  2E93 1A01 F646 30EA B487

[Index of Archives]     [Kernel Discussion]     [Gimp]     [Yosemite News]

  Powered by Linux