Re: [Autotest] [KVM-AUTOTEST] KSM-overcommit test v.2 (python version)

Lukáš Doktor <ldoktor@xxxxxxxxxx> · Tue, 01 Dec 2009 13:23:23 +0100

Dne 29.11.2009 17:17, Dor Laor napsal(a):
On 11/26/2009 12:11 PM, Lukáš Doktor wrote:
Hello Dor,

Thank you for your review. I have few questions about your comments:

----------- snip -----------
+ stat += "Guests memsh = {"
+ for vm in lvms:
+ if vm.is_dead():
+ logging.info("Trying to get informations of death VM: %s"
+ % vm.name)
+ continue

You can fail the entire test. Afterwards it will be hard to find the
issue.

Well if it's what the community wants, we can change it. We just didn't
want to lose information about the rest of the systems. Perhaps we can
set some DIE flag and after collecting all statistics raise an Error.

I don't think we need to continue testing if some thing as basic as VM
died upon us.
OK, we are going to change this.

----------- snip -----------
+ def get_true_pid(vm):
+ pid = vm.process.get_pid()
+ for i in range(1,10):
+ pid = pid + 1

What are you trying to do here? It's seems like a nasty hack that might
fail on load.

qemu has -pifile option. It works fine.

Oh my, I haven't thought on this. Of course I'm going to use -pidfile 
instead of this silly thing...

Yes and I'm really sorry for this ugly hack. The qemu command has
changed since the first patch was made. Nowadays the "vm.pid" returns
PID of the command itself, not the actual qemu process.
We need to have the PID of the actual qemu process, which is executed by
the command with PID "vm.pid". That's why first I try finding the qemu
process as the following "vm.pid" PID. I haven't found another solution
yet (in case we don't want to change the qemu command back in the
framework).
We have tested this solution under heavy process load and either first
or second part always finds the right value.

----------- snip -----------
+ if (params['ksm_test_size'] == "paralel") :
+ vmsc = 1
+ overcommit = 1
+ mem = host_mem
+ # 32bit system adjustment
+ if not params['image_name'].endswith("64"):
+ logging.debug("Probably i386 guest architecture, "\
+ "max allocator mem = 2G")

Better not to relay on the guest name. You can test percentage of the
guest mem.

What do you mean by "percentage of the guest mem"? This adjustment is
made because the maximum memory for 1 process in 32 bit OS is 2GB.
Testing of the 'image_name' showed to be most reliable method we found.

It's not that important but it should be a convention of kvm autotest.
If that's acceptable, fine, otherwise, each VM will define it in the
config file

Yes kvm-autotest definitely need a way to decide whether this is 32 or 
64 bit guest. I'll send a separate email to KVM-autotest mailing list to 
let others express their opinions.

----------- snip -----------
+ # Guest can have more than 2G but kvm mem + 1MB (allocator itself)
+ # can't
+ if (host_mem> 2048):
+ mem = 2047
+
+
+ if os.popen("uname -i").readline().startswith("i386"):
+ logging.debug("Host is i386 architecture, max guest mem is 2G")

There are bigger 32 bit guests.

How do you mean this note? We are testing whether the host machine is 32
bit. If so, the maximum process allocation is 2GB (similar case to 32
bit guest) but this time the whole qemu process (2GB qemu machine + 64
MB qemu overhead) can't exceeded 2GB.
Still the maximum memory used in test is the same (as we increase the VM
count - host_mem = quest_mem * vm_count; quest_mem is decreased,
vm_count is increased)

i386 guests with PAE mode (additional 4 bits) can have up to 16G ram on
theory.

OK so we should first check whether PAE is on and separate into 3 groups 
(64bit-unlimited, PAE-16G, 32bit-2G).

----------- snip -----------
+
+ # Copy the allocator.c into guests

.py

yes indeed.

----------- snip -----------
+ # Let kksmd works (until shared mem rich expected value)
+ shm = 0
+ i = 0
+ cmd = "cat/proc/%d/statm" % get_true_pid(vm)
+ while shm< ksm_size:
+ if i> 64:
+ logging.info(get_stat(lvms))
+ raise error.TestError("SHM didn't merged the memory until "\
+ "the DL on guest: %s"% (vm.name))
+ logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio))
+ time.sleep(ksm_size / 200 * perf_ratio)
+ try:
+ shm = int(os.popen(cmd).readline().split()[2])
+ shm = shm * 4 / 1024
+ i = i + 1

Either you have nice statistic calculation function or not.
I vote for the first case.

Yes, we are using the statistics function for the output. But in this
case we just need to know the shm value, not to log anything.
If this is a big problem even for others, we can split the statistics
function into 2:
int = _get_stat(vm) - returns shm value
string = get_stat(vm) - Uses _get_stats and creates a nice log output

----------- snip -----------
+ """ Check if memory in max loading guest is allright"""
+ logging.info("Starting phase 3b")
+
+ """ Kill rest of machine"""

We should have a function for it for all kvm autotest

you think lsessions[i].close() instead of (status,data) =
lsessions[i].get_command_status_output("exit;",20)?
Yes, it would be better.

+ for i in range(last_vm+1, vmsc):
+ (status,data) = lsessions[i].get_command_status_output("exit;",20)
+ if i == (vmsc-1):
+ logging.info(get_stat([lvms[i]]))
+ lvms[i].destroy(gracefully = False)

----------- snip -----------
+ def phase_paralel():
+ """ Paralel page spliting """
+ logging.info("Phase 1: Paralel page spliting")
+ # We have to wait until allocator is finished (it waits 5 seconds to
+ # clean the socket
+

The whole function is very similar to phase_separate_first_guest please
refactor them.

Yes, those functions are a bit similar. On the other hand there are some
differences. Instead of creating more complex function we agreed to
split them for better readability of the code.

The trouble is that re-implementing it make the code longer and more
error prone

OK, we'll create only one function.

----------- snip -----------
+ while shm< ksm_size:
+ if i> 64:
+ logging.info(get_stat(lvms))
+ raise error.TestError("SHM didn't merged the memory until DL")
+ logging.debug("Sleep(%d)" % (ksm_size / 200 * perf_ratio))
+ time.sleep(ksm_size / 200 * perf_ratio)
+ try:
+ shm = int(os.popen(cmd).readline().split()[2])
+ shm = shm * 4 / 1024
+ except:
+ raise error.TestError("Could not fetch shmem info from proc")

Didn't you needed to increase i?

yes, you are right. This line somehow disappeard "i = i + 1"...

Does it pass all the tests?

Not incrementing the "i" could lead to infinite loop but only when KSM 
fails. That's why we haven't found the mistake by testing... Well fix 
this altogether with other modifications.

----------- snip -----------
+ def compare_page(self,original,inmem):
+ """
+ compare memory
+

Why do you need it? Is it to really check ksm didn't do damage?
Interesting, I never doubted ksm for that. Actually it is good idea to
test...

We were asked to do so (be paranoid, everything could happened). We can
make this voluntary in the config.

I guess it just add sanity, better leave it on always, I guess it's not
that time consuming.

+1
(Actually using python this is kind of time consuming. But compare to 
the whole test duration it's nothing.)

Once again thanks, I'm looking forward to your replay.

Best regards,
Lukáš Doktor
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html