Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive

Michael Goldish <mgoldish@xxxxxxxxxx> · Wed, 10 Jun 2009 07:52:17 -0400 (EDT)

----- "Yolkfull Chow" <yzhou@xxxxxxxxxx> wrote:

> On 06/10/2009 06:03 PM, Michael Goldish wrote:
> > ----- "Yolkfull Chow"<yzhou@xxxxxxxxxx>  wrote:
> >
> >    
> >> On 06/09/2009 05:44 PM, Michael Goldish wrote:
> >>      
> >>> The test looks pretty nicely written. Comments:
> >>>
> >>> 1. Consider making all the cloned VMs use image snapshots:
> >>>
> >>> curr_vm = vm1.clone()
> >>> curr_vm.get_params()["extra_params"] += " -snapshot"
> >>>
> >>> I'm not sure it's a good idea to let all VMs use the same disk
> >>>        
> >> image.
> >>      
> >>> Or maybe you shouldn't add -snapshot yourself, but rather do it
> in
> >>>        
> >> the config
> >>      
> >>> file for the first VM, and then all cloned VMs will have
> -snapshot
> >>>        
> >> as well.
> >>      
> >>>
> >>>        
> >> Yes I use 'image_snapshot = yes' in config file.
> >>      
> >>> 2. Consider changing the message
> >>> " Booting the %dth guest" % num
> >>> to
> >>> "Booting guest #%d" % num
> >>> (because there's no such thing as 2th and 3th)
> >>>
> >>> 3. Consider changing the message
> >>> "Cannot boot vm anylonger"
> >>> to
> >>> "Cannot create VM #%d" % num
> >>>
> >>> 4. Why not add curr_vm to vms immediately after cloning it?
> >>> That way you can kill it in the exception handler later, without
> >>>        
> >> having
> >>      
> >>> to send it a 'quit' if you can't login ('if not
> curr_vm_session').
> >>>
> >>>        
> >> Yes, good idea.
> >>      
> >>> 5. " %dth guest boots up successfully" % num -->   again, 2th and
> 3th
> >>>        
> >> make no sense.
> >>      
> >>> Also, I wonder why you add those spaces before every info
> message.
> >>>
> >>> 6. "%dth guest's session is not responsive" -->   same
> >>> (maybe use "Guest session #%d is not responsive" % num)
> >>>
> >>> 7. "Shut down the %dth guest" -->   same
> >>> (maybe "Shutting down guest #%d"? or destroying/killing?)
> >>>
> >>> 8. Shouldn't we fail the test when we find an unresponsive
> session?
> >>> It seems you just display an error message. You can simply
> replace
> >>> logging.error( with raise error.TestFail(.
> >>>
> >>>        
> >>      
> >>> 9. Consider using a stricter test than just
> >>>        
> >> vm_session.is_responsive().
> >>      
> >>> vm_session.is_responsive() just sends ENTER to the sessions and
> >>>        
> >> returns
> >>      
> >>> True if it gets anything as a result (usually a prompt, or even
> just
> >>>        
> >> a
> >>      
> >>> newline echoed back). If the session passes this test it is
> indeed
> >>> responsive, so it's a decent test, but maybe you can send some
> >>>        
> >> command
> >>      
> >>> (user configurable?) and test for some output. I'm really not
> sure
> >>>        
> >> this
> >>      
> >>> is important, because I can't imagine a session would respond to
> a
> >>>        
> >> newline
> >>      
> >>> but not to other commands, but who knows. Maybe you can send the
> >>>        
> >> first VM
> >>      
> >>> a user-specified command when the test begins, remember the
> output,
> >>>        
> >> and
> >>      
> >>> then send all other VMs the same command and make sure the output
> is
> >>>        
> >> the
> >>      
> >>> same.
> >>>
> >>>        
> >> maybe use 'info status' and send command 'help' via session to vms
> and
> >> compare their output?
> >>      
> > I'm not sure I understand. What does 'info status' do? We're talking
> about
> > an SSH shell, not the monitor. You can do whatever you like, like
> 'uname -a',
> > and 'ls /', but you should leave it up to the user to decide, so
> he/she
> > can specify different commands for different guests. Linux commands
> won't
> > work under Windows, so Linux and Windows must have different
> commands in
> > the config file. In the Linux section, under '- @Linux:' you can
> add
> > something like:
> >
> > stress_boot:
> >      stress_boot_test_command = uname -a
> >
> > and under '- @Windows:':
> >
> > stress_boot:
> >      stress_boot_test_command = ver && vol
> >
> > These commands are just naive suggestions. I'm sure someone can
> think of
> > much more informative commands.
> >    
> That's really good suggestions.  Thanks, Michael.  And can I use 
> 'migration_test_command' instead?

Not really. Why would you want to use another test's param?

1. There's no guarantee that 'migration_test_command' is defined
for your boot stress test. In fact, it is probably only defined for
migration tests, so you probably won't be able to access it. Try
params.get('migration_test_command') in your test and you'll probably
get None.

2. The user may not want to run migration at all, and then he/she
will probably not define 'migration_test_command'.

3. The user might want to use different test commands for migration
and for the boot stress test.

> >>> 10. I'm not sure you should use the param "kill_vm_gracefully"
> >>>        
> >> because that's
> >>      
> >>> a postprocessor param (probably not your business). You can just
> >>>        
> >> call
> >>      
> >>> destroy() in the exception handler with gracefully=False, because
> if
> >>>        
> >> the VMs
> >>      
> >>> are non- responsive, I don't expect them to shutdown nicely with
> an
> >>>        
> >> SSH
> >>      
> >>> command (that's what gracefully does). Also, we're using
> -snapshot,
> >>>        
> >> so
> >>      
> >>> there's no reason to shut them down nicely.
> >>>
> >>>        
> >> Yes,  I agree. :)
> >>      
> >>> 11. "Total number booted successfully: %d" % (num - 1) -->   why
> not
> >>>        
> >> just num?
> >>      
> >>> We really have num VMs including the first one.
> >>> Or you can say: "Total number booted successfully in addition to
> the
> >>>        
> >> first one"
> >>      
> >>> but that's much longer.
> >>>
> >>>        
> >> Since after the first guest booted, I set num = 1 and then  'num +=
> 1'
> >>
> >> at first in while loop ( for the purpose of getting a new vm ).
> >> So curr_vm is vm2 ( num is 2) now. If the second vm failed to boot
> up,
> >> the num booted successfully should be (num - 1).
> >> I would use enumerate(vms) that Uri suggested to make number easier
> to
> >> count.
> >>      
> > OK, I didn't notice that.
> >
> >    
> >>> 12. Consider adding a 'max_vms' (or 'threshold') user param to
> the
> >>>        
> >> test. If
> >>      
> >>> num reaches 'max_vms', we stop adding VMs and pass the test.
> >>>        
> >> Otherwise the
> >>      
> >>> test will always fail (which is depressing). If
> >>>        
> >> params.get("threshold") is
> >>      
> >>> None or "", or in short -- 'if not params.get("threshold")',
> disable
> >>>        
> >> this
> >>      
> >>> feature and keep adding VMs forever. The user can enable the
> feature
> >>>        
> >> with:
> >>      
> >>> max_vms = 50
> >>> or disable it with:
> >>> max_vms =
> >>>
> >>>        
> >> This is a good idea for hardware resource limit of host.
> >>      
> >>> 13. Why are you catching OSError? If you get OSError it might be
> a
> >>>        
> >> framework bug.
> >>      
> >>>
> >>>        
> >> Since sometimes, vm.create() successfully but failed to ssh-login
> >> since
> >> the running python cannot allocate physical memory (OSError).
> >> Add max_vms could fix this problem I think.
> >>      
> > Do you remember exactly where OSError was thrown? Do you happen to
> have
> > a backtrace? (I just want to be very it's not a bug.)
> >    
> The OSError was thrown when checking all VMs are responsive and I got
> many traceback about "OSError: [Errno 12] Cannot allocate memory".
> Maybe since when last VM was created successfully with lucky,  whereas
> python cannot get physical memory after that when checking all
> sessions.
> So can we now catch the OSError and tell user the number of max_vms 
> is too large?

Sure. I was just worried it might be a framework bug. If it's a legitimate
memory error -- catch it and fail the test.

If you happen to catch that OSError again, and get a backtrace, I'd like
to see it if that's possible.

Thanks,
Michael

> >>> 14. At the end of the exception handler you should proably
> re-raise
> >>>        
> >> the exception
> >>      
> >>> you caught. Otherwise the user won't see the error message. You
> can
> >>>        
> >> simply replace
> >>      
> >>> 'break' with 'raise' (no parameters), and it should work,
> >>>        
> >> hopefully.
> >>      
> >>>
> >>>        
> >> Yes I should if add a 'max_vms'.
> >>      
> > I think you should re-raise anyway. Otherwise, what's the point in
> writing
> > error messages such as "raise error.TestFail("Cannot boot vm
> anylonger")"?
> > I you don't re-raise, the user won't see the messages.
> >
> >    
> >>> I know these are quite a few comments, but they're all rather
> minor
> >>>        
> >> and the test
> >>      
> >>> is well written in my opinion.
> >>>
> >>>        
> >> Thank you,  I will do modification according to your and Uri's
> >> comments,
> >> and will re-submit it here later. :)
> >>
> >> Thanks and Best Regards,
> >> Yolkfull
> >>      
> >>> Thanks,
> >>> Michael
> >>>
> >>> ----- Original Message -----
> >>> From: "Yolkfull Chow"<yzhou@xxxxxxxxxx>
> >>> To:kvm@xxxxxxxxxxxxxxx
> >>> Cc: "Uri Lublin"<uril@xxxxxxxxxx>
> >>> Sent: Tuesday, June 9, 2009 11:41:54 AM (GMT+0200) Auto-Detected
> >>> Subject: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one
> of
> >>>        
> >> them becomes unresponsive
> >>      
> >>>
> >>> Hi,
> >>>
> >>> This test will boot VMs until one of them becomes unresponsive,
> and
> >>> records the maximum number of VMs successfully started.
> >>>
> >>>
> >>>
> >>>        
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>      
> 
> 
> -- 
> Yolkfull
> Regards,
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html