Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive

Yolkfull Chow <yzhou@xxxxxxxxxx> · Thu, 11 Jun 2009 11:37:16 +0800

On 06/10/2009 07:52 PM, Michael Goldish wrote:
----- "Yolkfull Chow"<yzhou@xxxxxxxxxx>  wrote:

On 06/10/2009 06:03 PM, Michael Goldish wrote:

----- "Yolkfull Chow"<yzhou@xxxxxxxxxx>   wrote:

On 06/09/2009 05:44 PM, Michael Goldish wrote:

The test looks pretty nicely written. Comments:

1. Consider making all the cloned VMs use image snapshots:

curr_vm = vm1.clone()
curr_vm.get_params()["extra_params"] += " -snapshot"

I'm not sure it's a good idea to let all VMs use the same disk

image.

Or maybe you shouldn't add -snapshot yourself, but rather do it

in

the config

file for the first VM, and then all cloned VMs will have

-snapshot

as well.

Yes I use 'image_snapshot = yes' in config file.

2. Consider changing the message
" Booting the %dth guest" % num
to
"Booting guest #%d" % num
(because there's no such thing as 2th and 3th)

3. Consider changing the message
"Cannot boot vm anylonger"
to
"Cannot create VM #%d" % num

4. Why not add curr_vm to vms immediately after cloning it?
That way you can kill it in the exception handler later, without

having

to send it a 'quit' if you can't login ('if not

curr_vm_session').

Yes, good idea.

5. " %dth guest boots up successfully" % num -->    again, 2th and

3th

make no sense.

Also, I wonder why you add those spaces before every info

message.

6. "%dth guest's session is not responsive" -->    same
(maybe use "Guest session #%d is not responsive" % num)

7. "Shut down the %dth guest" -->    same
(maybe "Shutting down guest #%d"? or destroying/killing?)

8. Shouldn't we fail the test when we find an unresponsive

session?

It seems you just display an error message. You can simply

replace

logging.error( with raise error.TestFail(.

9. Consider using a stricter test than just

vm_session.is_responsive().

vm_session.is_responsive() just sends ENTER to the sessions and

returns

True if it gets anything as a result (usually a prompt, or even

just

a

newline echoed back). If the session passes this test it is

indeed

responsive, so it's a decent test, but maybe you can send some

command

(user configurable?) and test for some output. I'm really not

sure

this

is important, because I can't imagine a session would respond to

a

newline

but not to other commands, but who knows. Maybe you can send the

first VM

a user-specified command when the test begins, remember the

output,

and

then send all other VMs the same command and make sure the output

is

the

same.

maybe use 'info status' and send command 'help' via session to vms

and

compare their output?

I'm not sure I understand. What does 'info status' do? We're talking

about

an SSH shell, not the monitor. You can do whatever you like, like

'uname -a',

and 'ls /', but you should leave it up to the user to decide, so

he/she

can specify different commands for different guests. Linux commands

won't

work under Windows, so Linux and Windows must have different

commands in

the config file. In the Linux section, under '- @Linux:' you can

add

something like:

stress_boot:
      stress_boot_test_command = uname -a

and under '- @Windows:':

stress_boot:
      stress_boot_test_command = ver&&  vol

These commands are just naive suggestions. I'm sure someone can

think of

much more informative commands.

That's really good suggestions.  Thanks, Michael.  And can I use
'migration_test_command' instead?

Not really. Why would you want to use another test's param?

1. There's no guarantee that 'migration_test_command' is defined
for your boot stress test. In fact, it is probably only defined for
migration tests, so you probably won't be able to access it. Try
params.get('migration_test_command') in your test and you'll probably
get None.

2. The user may not want to run migration at all, and then he/she
will probably not define 'migration_test_command'.

3. The user might want to use different test commands for migration
and for the boot stress test.

10. I'm not sure you should use the param "kill_vm_gracefully"

because that's

a postprocessor param (probably not your business). You can just

call

destroy() in the exception handler with gracefully=False, because

if

the VMs

are non- responsive, I don't expect them to shutdown nicely with

an

SSH

command (that's what gracefully does). Also, we're using

-snapshot,

so

there's no reason to shut them down nicely.

Yes,  I agree. :)

11. "Total number booted successfully: %d" % (num - 1) -->    why

not

just num?

We really have num VMs including the first one.
Or you can say: "Total number booted successfully in addition to

the

first one"

but that's much longer.

Since after the first guest booted, I set num = 1 and then  'num +=

1'

at first in while loop ( for the purpose of getting a new vm ).
So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot

up,

the num booted successfully should be (num - 1).
I would use enumerate(vms) that Uri suggested to make number easier

to

count.

OK, I didn't notice that.

12. Consider adding a 'max_vms' (or 'threshold') user param to

the

test. If

num reaches 'max_vms', we stop adding VMs and pass the test.

Otherwise the

test will always fail (which is depressing). If

params.get("threshold") is

None or "", or in short -- 'if not params.get("threshold")',

disable

this

feature and keep adding VMs forever. The user can enable the

feature

with:

max_vms = 50
or disable it with:
max_vms =

This is a good idea for hardware resource limit of host.

13. Why are you catching OSError? If you get OSError it might be

a

framework bug.

Since sometimes, vm.create() successfully but failed to ssh-login
since
the running python cannot allocate physical memory (OSError).
Add max_vms could fix this problem I think.

Do you remember exactly where OSError was thrown? Do you happen to

have

a backtrace? (I just want to be very it's not a bug.)

The OSError was thrown when checking all VMs are responsive and I got
many traceback about "OSError: [Errno 12] Cannot allocate memory".
Maybe since when last VM was created successfully with lucky,  whereas
python cannot get physical memory after that when checking all
sessions.
So can we now catch the OSError and tell user the number of max_vms
is too large?

Sure. I was just worried it might be a framework bug. If it's a legitimate
memory error -- catch it and fail the test.

If you happen to catch that OSError again, and get a backtrace, I'd like
to see it if that's possible.

Michael, these are the backtrace messages:

...
20090611-064959 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
ERROR: run_once: Test failed: [Errno 12] Cannot allocate memory
20090611-064959 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
DEBUG: run_once: Postprocessing on error...
20090611-065000 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
DEBUG: postprocess_vm: Postprocessing VM 'vm1'...
20090611-065000 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
DEBUG: postprocess_vm: VM object found in environment
20090611-065000 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
DEBUG: send_monitor_cmd: Sending monitor command: screendump 
/kvm-autotest/client/results/default/kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>/debug/post_vm1.ppm
20090611-065000 
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024: 
DEBUG: run_once: Contents of environment: {'vm__vm1': <kvm_vm.VM 
instance at 0x92999a28>}
post-test sysinfo error:
Traceback (most recent call last):
  File "/kvm-autotest/client/common_lib/log.py", line 58, in decorated_func
    fn(*args, **dargs)
  File "/kvm-autotest/client/bin/base_sysinfo.py", line 213, in 
log_after_each_test
    log.run(test_sysinfodir)
  File "/kvm-autotest/client/bin/base_sysinfo.py", line 112, in run
    shell=True, env=env)
  File "/usr/lib64/python2.4/subprocess.py", line 412, in call
    return Popen(*args, **kwargs).wait()
  File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
2009-06-11 06:50:02,859 Configuring logger for client level
        FAIL    
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>    
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>    
timestamp=1244717402    localtime=Jun 11 06:50:02    Unhandled OSError: 
[Errno 12] Cannot allocate memory
          Traceback (most recent call last):
            File "/kvm-autotest/client/common_lib/test.py", line 304, 
in _exec
              self.execute(*p_args, **p_dargs)
            File "/kvm-autotest/client/common_lib/test.py", line 187, 
in execute
              self.run_once(*args, **dargs)
            File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_runtest_2.py", line 145, 
in run_once
              routine_obj.routine(self, params, env)
            File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_tests.py", line 3071, in 
run_boot_vms
              curr_vm_session = kvm_utils.wait_for(curr_vm.ssh_login, 
240, 0, 2)
            File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 797, in 
wait_for
              output = func()
            File "/kvm-autotest/client/tests/kvm_runtest_2/kvm_vm.py", 
line 728, in ssh_login
              session = kvm_utils.ssh(address, port, username, 
password, prompt, timeout)
            File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 553, in ssh
              return remote_login(command, password, prompt, "\n", timeout)
            File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 431, in 
remote_login
              sub = kvm_spawn(command, linesep)
            File 
"/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py", line 114, in 
__init__
              (pid, fd) = pty.fork()
            File "/usr/lib64/python2.4/pty.py", line 108, in fork
              pid = os.fork()
          OSError: [Errno 12] Cannot allocate memory
Persistent state variable __group_level now set to 1
    END FAIL    
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>    
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]<no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024>    
timestamp=1244717403    localtime=Jun 11 06:50:03
Dropping caches
2009-06-11 06:50:03,409 running: sync
JOB ERROR: Unhandled OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
  File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
    execfile(self.control, global_control_vars, global_control_vars)
  File "/kvm-autotest/client/control", line 1030, in ?
    cfg_to_test("kvm_tests.cfg")
  File "/kvm-autotest/client/control", line 1013, in cfg_to_test
    current_status = job.run_test("kvm_runtest_2", params=dict, 
tag=tagname)
  File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
    utils.drop_caches()
  File "/kvm-autotest/client/bin/base_utils.py", line 638, in drop_caches
    utils.system("sync")
  File "/kvm-autotest/client/common_lib/utils.py", line 510, in system
    stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
  File "/kvm-autotest/client/common_lib/utils.py", line 330, in run
    bg_job = join_bg_jobs(
  File "/kvm-autotest/client/common_lib/utils.py", line 37, in __init__
    stdin=stdin)
  File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Persistent state variable __group_level now set to 0
END ABORT    ----    ----    timestamp=1244717418    localtime=Jun 11 
06:50:18    Unhandled OSError: [Errno 12] Cannot allocate memory
  Traceback (most recent call last):
    File "/kvm-autotest/client/bin/job.py", line 978, in step_engine
      execfile(self.control, global_control_vars, global_control_vars)
    File "/kvm-autotest/client/control", line 1030, in ?
      cfg_to_test("kvm_tests.cfg")
    File "/kvm-autotest/client/control", line 1013, in cfg_to_test
      current_status = job.run_test("kvm_runtest_2", params=dict, 
tag=tagname)
    File "/kvm-autotest/client/bin/job.py", line 44, in wrapped
      utils.drop_caches()
    File "/kvm-autotest/client/bin/base_utils.py", line 638, in drop_caches
      utils.system("sync")
    File "/kvm-autotest/client/common_lib/utils.py", line 510, in system
      stdout_tee=sys.stdout, stderr_tee=sys.stderr).exit_status
    File "/kvm-autotest/client/common_lib/utils.py", line 330, in run
      bg_job = join_bg_jobs(
    File "/kvm-autotest/client/common_lib/utils.py", line 37, in __init__
      stdin=stdin)
    File "/usr/lib64/python2.4/subprocess.py", line 542, in __init__
      errread, errwrite)
    File "/usr/lib64/python2.4/subprocess.py", line 902, in _execute_child
      self.pid = os.fork()
  OSError: [Errno 12] Cannot allocate memory
[root@dhcp-66-70-9 kvm_runtest_2]#
Thanks,
Michael

14. At the end of the exception handler you should proably

re-raise

the exception

you caught. Otherwise the user won't see the error message. You

can

simply replace

'break' with 'raise' (no parameters), and it should work,

hopefully.

Yes I should if add a 'max_vms'.

I think you should re-raise anyway. Otherwise, what's the point in

writing

error messages such as "raise error.TestFail("Cannot boot vm

anylonger")"?

I you don't re-raise, the user won't see the messages.

I know these are quite a few comments, but they're all rather

minor

and the test

is well written in my opinion.

Thank you,  I will do modification according to your and Uri's
comments,
and will re-submit it here later. :)

Thanks and Best Regards,
Yolkfull

Thanks,
Michael

----- Original Message -----
From: "Yolkfull Chow"<yzhou@xxxxxxxxxx>
To:kvm@xxxxxxxxxxxxxxx
Cc: "Uri Lublin"<uril@xxxxxxxxxx>
Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected
Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one

of

them becomes unresponsive

Hi,

This test will boot VMs until one of them becomes unresponsive,

and

records the maximum number of VMs successfully started.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Yolkfull
Regards,

--
Yolkfull
Regards,

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html