On 06/10/2009 07:52 PM, Michael Goldish wrote:
----- "Yolkfull Chow"<yzhou@xxxxxxxxxx> wrote:
On 06/10/2009 06:03 PM, Michael Goldish wrote:
----- "Yolkfull Chow"<yzhou@xxxxxxxxxx> wrote:
On 06/09/2009 05:44 PM, Michael Goldish wrote:
The test looks pretty nicely written. Comments:
1. Consider making all the cloned VMs use image snapshots:
curr_vm = vm1.clone()
curr_vm.get_params()["extra_params"] += " -snapshot"
I'm not sure it's a good idea to let all VMs use the same disk
image.
Or maybe you shouldn't add -snapshot yourself, but rather do it
in
the config
file for the first VM, and then all cloned VMs will have
-snapshot
as well.
Yes I use 'image_snapshot = yes' in config file.
2. Consider changing the message
" Booting the %dth guest" % num
to
"Booting guest #%d" % num
(because there's no such thing as 2th and 3th)
3. Consider changing the message
"Cannot boot vm anylonger"
to
"Cannot create VM #%d" % num
4. Why not add curr_vm to vms immediately after cloning it?
That way you can kill it in the exception handler later, without
having
to send it a 'quit' if you can't login ('if not
curr_vm_session').
Yes, good idea.
5. " %dth guest boots up successfully" % num --> again, 2th and
3th
make no sense.
Also, I wonder why you add those spaces before every info
message.
6. "%dth guest's session is not responsive" --> same
(maybe use "Guest session #%d is not responsive" % num)
7. "Shut down the %dth guest" --> same
(maybe "Shutting down guest #%d"? or destroying/killing?)
8. Shouldn't we fail the test when we find an unresponsive
session?
It seems you just display an error message. You can simply
replace
logging.error( with raise error.TestFail(.
9. Consider using a stricter test than just
vm_session.is_responsive().
vm_session.is_responsive() just sends ENTER to the sessions and
returns
True if it gets anything as a result (usually a prompt, or even
just
a
newline echoed back). If the session passes this test it is
indeed
responsive, so it's a decent test, but maybe you can send some
command
(user configurable?) and test for some output. I'm really not
sure
this
is important, because I can't imagine a session would respond to
a
newline
but not to other commands, but who knows. Maybe you can send the
first VM
a user-specified command when the test begins, remember the
output,
and
then send all other VMs the same command and make sure the output
is
the
same.
maybe use 'info status' and send command 'help' via session to vms
and
compare their output?
I'm not sure I understand. What does 'info status' do? We're talking
about
an SSH shell, not the monitor. You can do whatever you like, like
'uname -a',
and 'ls /', but you should leave it up to the user to decide, so
he/she
can specify different commands for different guests. Linux commands
won't
work under Windows, so Linux and Windows must have different
commands in
the config file. In the Linux section, under '- @Linux:' you can
add
something like:
stress_boot:
stress_boot_test_command = uname -a
and under '- @Windows:':
stress_boot:
stress_boot_test_command = ver&& vol
These commands are just naive suggestions. I'm sure someone can
think of
much more informative commands.
That's really good suggestions. Thanks, Michael. And can I use
'migration_test_command' instead?
Not really. Why would you want to use another test's param?
1. There's no guarantee that 'migration_test_command' is defined
for your boot stress test. In fact, it is probably only defined for
migration tests, so you probably won't be able to access it. Try
params.get('migration_test_command') in your test and you'll probably
get None.
2. The user may not want to run migration at all, and then he/she
will probably not define 'migration_test_command'.
3. The user might want to use different test commands for migration
and for the boot stress test.
10. I'm not sure you should use the param "kill_vm_gracefully"
because that's
a postprocessor param (probably not your business). You can just
call
destroy() in the exception handler with gracefully=False, because
if
the VMs
are non- responsive, I don't expect them to shutdown nicely with
an
SSH
command (that's what gracefully does). Also, we're using
-snapshot,
so
there's no reason to shut them down nicely.
Yes, I agree. :)
11. "Total number booted successfully: %d" % (num - 1) --> why
not
just num?
We really have num VMs including the first one.
Or you can say: "Total number booted successfully in addition to
the
first one"
but that's much longer.
Since after the first guest booted, I set num = 1 and then 'num +=
1'
at first in while loop ( for the purpose of getting a new vm ).
So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot
up,
the num booted successfully should be (num - 1).
I would use enumerate(vms) that Uri suggested to make number easier
to
count.
OK, I didn't notice that.
12. Consider adding a 'max_vms' (or 'threshold') user param to
the
test. If
num reaches 'max_vms', we stop adding VMs and pass the test.
Otherwise the
test will always fail (which is depressing). If
params.get("threshold") is
None or "", or in short -- 'if not params.get("threshold")',
disable
this
feature and keep adding VMs forever. The user can enable the
feature
with:
max_vms = 50
or disable it with:
max_vms =
This is a good idea for hardware resource limit of host.
13. Why are you catching OSError? If you get OSError it might be
a
framework bug.
Since sometimes, vm.create() successfully but failed to ssh-login
since
the running python cannot allocate physical memory (OSError).
Add max_vms could fix this problem I think.
Do you remember exactly where OSError was thrown? Do you happen to
have
a backtrace? (I just want to be very it's not a bug.)
The OSError was thrown when checking all VMs are responsive and I got
many traceback about "OSError: [Errno 12] Cannot allocate memory".
Maybe since when last VM was created successfully with lucky, whereas
python cannot get physical memory after that when checking all
sessions.
So can we now catch the OSError and tell user the number of max_vms
is too large?
Sure. I was just worried it might be a framework bug. If it's a legitimate
memory error -- catch it and fail the test.
If you happen to catch that OSError again, and get a backtrace, I'd like
to see it if that's possible.
Michael, these are the backtrace messages:
...
20090611-064959
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
ERROR: run_once: Test failed: [Errno 12] Cannot allocate memory
20090611-064959
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
DEBUG: run_once: Postprocessing on error...
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
DEBUG: postprocess_vm: Postprocessing VM 'vm1'...
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
DEBUG: postprocess_vm: VM object found in environment
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
DEBUG: send_monitor_cmd: Sending monitor command: screendump
/kvm-autotest/client/results/default/kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>/debug/post_vm1.ppm
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
DEBUG: run_once: Contents of environment: {'vm__vm1': <kvm_vm.VM
instance at 0x92999a28>}
post-test sysinfo error:
Traceback (most recent call last):
File "/kvm-autotest/client/common_lib/log.py", line 58, in decorated_func
fn(*args, **dargs)
File "/kvm-autotest/client/bin/base_sysinfo.py", line 213, in
log_after_each_test
log.run(test_sysinfodir)
File "/kvm-autotest/client/bin/base_sysinfo.py", line 112, in run
shell=True, env=env)
File "/usr/lib64/python2.4/subprocess.py", line 412, in call
return Popen(*args, **kwargs).wait()
File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
errread, errwrite)
File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
2009-06-11 06:50:02,859 Configuring logger for client level
FAIL
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
timestamp=1244717402 localtime=Jun 11 06:50:02 Unhandled OSError:
[Errno 12] Cannot allocate memory
Traceback (most recent call last):
File "/kvm-autotest/client/common_lib/test.py", line 304,
in _exec
self.execute(*p_args, **p_dargs)
File "/kvm-autotest/client/common_lib/test.py", line 187,
in execute
self.run_once(*args, **dargs)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_runtest_2.py", line 145,
in run_once
routine_obj.routine(self, params, env)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_tests.py", line 3071, in
run_boot_vms
curr_vm_session = kvm_utils.wait_for(curr_vm.ssh_login,
240, 0, 2)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 797, in
wait_for
output = func()
File "/kvm-autotest/client/tests/kvm_runtest_2/kvm_vm.py",
line 728, in ssh_login
session = kvm_utils.ssh(address, port, username,
password, prompt, timeout)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 553, in ssh
return remote_login(command, password, prompt, "\n", timeout)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 431, in
remote_login
sub = kvm_spawn(command, linesep)
File
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 114, in
__init__
(pid, fd) = pty.fork()
File "/usr/lib64/python2.4/pty.py", line 108, in fork
pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
Persistent state variable __group_level now set to 1
END FAIL
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>
timestamp=1244717403 localtime=Jun 11 06:50:03
Dropping caches
2009-06-11 06:50:03,409 running: sync
JOB ERROR: Unhandled OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
execfile(self.control, global_control_vars, global_control_vars)
File "/kvm-autotest/client/control", line 1030, in ?
cfg_to_test("kvm_tests.cfg")
File "/kvm-autotest/client/control", line 1013, in cfg_to_test
current_status = job.run_test("kvm_runtest_2", params=dict,
tag=tagname)
File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
utils.drop_caches()
File "/kvm-autotest/client/bin/base_utils.py", line 638, in drop_caches
utils.system("sync")
File "/kvm-autotest/client/common_lib/utils.py", line 510, in system
stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
File "/kvm-autotest/client/common_lib/utils.py", line 330, in run
bg_job = join_bg_jobs(
File "/kvm-autotest/client/common_lib/utils.py", line 37, in __init__
stdin=stdin)
File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
errread, errwrite)
File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
Persistent state variable __group_level now set to 0
END ABORT ---- ---- timestamp=1244717418 localtime=Jun 11
06:50:18 Unhandled OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
execfile(self.control, global_control_vars, global_control_vars)
File "/kvm-autotest/client/control", line 1030, in ?
cfg_to_test("kvm_tests.cfg")
File "/kvm-autotest/client/control", line 1013, in cfg_to_test
current_status = job.run_test("kvm_runtest_2", params=dict,
tag=tagname)
File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
utils.drop_caches()
File "/kvm-autotest/client/bin/base_utils.py", line 638, in drop_caches
utils.system("sync")
File "/kvm-autotest/client/common_lib/utils.py", line 510, in system
stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
File "/kvm-autotest/client/common_lib/utils.py", line 330, in run
bg_job = join_bg_jobs(
File "/kvm-autotest/client/common_lib/utils.py", line 37, in __init__
stdin=stdin)
File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
errread, errwrite)
File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
[root@dhcp-66-70-9 kvm_runtest_2]#
Thanks,
Michael
14. At the end of the exception handler you should proably
re-raise
the exception
you caught. Otherwise the user won't see the error message. You
can
simply replace
'break' with 'raise' (no parameters), and it should work,
hopefully.
Yes I should if add a 'max_vms'.
I think you should re-raise anyway. Otherwise, what's the point in
writing
error messages such as "raise error.TestFail("Cannot boot vm
anylonger")"?
I you don't re-raise, the user won't see the messages.
I know these are quite a few comments, but they're all rather
minor
and the test
is well written in my opinion.
Thank you, I will do modification according to your and Uri's
comments,
and will re-submit it here later. :)
Thanks and Best Regards,
Yolkfull
Thanks,
Michael
----- Original Message -----
From: "Yolkfull Chow"<yzhou@xxxxxxxxxx>
To:kvm@xxxxxxxxxxxxxxx
Cc: "Uri Lublin"<uril@xxxxxxxxxx>
Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected
Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one
of
them becomes unresponsive
Hi,
This test will boot VMs until one of them becomes unresponsive,
and
records the maximum number of VMs successfully started.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Yolkfull
Regards,
--
Yolkfull
Regards,
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html