On 10/28/2009 08:54 AM, Michael Goldish wrote:
----- "Dor Laor"<dlaor@xxxxxxxxxx> wrote:
On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote:
Hi Michael, I am reviewing your patchset and have just a minor
remark
to make here:
On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldish<mgoldish@xxxxxxxxxx>
wrote:
This patch adds a new test that checks the timedrift introduced by
migrations.
It uses the same parameters used by the timedrift test to get the
guest time.
In addition, the number of migrations the test performs is
controlled by the
parameter 'migration_iterations'.
Signed-off-by: Michael Goldish<mgoldish@xxxxxxxxxx>
---
client/tests/kvm/kvm_tests.cfg.sample | 33
++++---
client/tests/kvm/tests/timedrift_with_migration.py | 95
++++++++++++++++++++
2 files changed, 115 insertions(+), 13 deletions(-)
create mode 100644
client/tests/kvm/tests/timedrift_with_migration.py
diff --git a/client/tests/kvm/kvm_tests.cfg.sample
b/client/tests/kvm/kvm_tests.cfg.sample
index 540d0a2..618c21e 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -100,19 +100,26 @@ variants:
type = linux_s3
- timedrift: install setup
- type = timedrift
extra_params += " -rtc-td-hack"
- # Pin the VM and host load to CPU #0
- cpu_mask = 0x1
- # Set the load and rest durations
- load_duration = 20
- rest_duration = 20
- # Fail if the drift after load is higher than 50%
- drift_threshold = 50
- # Fail if the drift after the rest period is higher than
10%
- drift_threshold_after_rest = 10
- # For now, make sure this test is executed alone
- used_cpus = 100
+ variants:
+ - with_load:
+ type = timedrift
+ # Pin the VM and host load to CPU #0
+ cpu_mask = 0x1
Let's use -smp 2 always.
We can also just make -smp 2 the default for all tests. Does that sound
good?
Yes
btw: we need not to parallel the load test with standard tests.
We already don't, because the load test has used_cpus = 100 which
forces it to run alone.
Soon I'll have 100 on my laptop :), better change it to -1 or MAX_INT
+ # Set the load and rest durations
+ load_duration = 20
+ rest_duration = 20
Even the default duration here seems way too brief here, is there
any
reason why 20s was chosen instead of, let's say, 1800s? I am under
the
impression that 20s of load won't be enough to cause any noticeable
drift...
+ # Fail if the drift after load is higher than 50%
+ drift_threshold = 50
+ # Fail if the drift after the rest period is
higher than 10%
+ drift_threshold_after_rest = 10
I am also curious about those tresholds and the reasoning behind
them.
Is there any official agreement on what we consider to be an
unreasonable drift?
Another thing that struck me out is drift calculation: On the
original
timedrift test, the guest drift is normalized against the host
drift:
drift = 100.0 * (host_delta - guest_delta) / host_delta
While in the new drift tests, we consider only the guest drift. I
believe is better to normalize all tests based on one drift
calculation criteria, and those values should be reviewed, and at
least a certain level of agreement on our development community
should
be reached.
I think we don't need to calculate drift ratio. We should define a
threshold in seconds, let's say 2 seconds. Beyond that, there should
not be any drift.
Are you talking about the timedrift with load or timedrift with
migration or reboot tests? I was told that when running the load test
for e.g 60 secs, the drift should be given in % of that duration.
In the case of migration and reboot, absolute durations are used (in
seconds, no %). Should we do that in the load test too?
Yes, but: during extreme load, we do predict that a guest *without* pv
clock will drift and won't be able to catchup until the load stops and
only then it will catchup. So my recommendation is to do the following:
- pvclock guest - can check with 'cat
/sys/devices/system/clocksource/clocksource0/current_clocksource ' don't
allow drift during huge loads.
Exist (+safe) for rhel5.4 guests and ~2.6.29 (from 2.6.27).
- non-pv clock - run the load, stop the load, wait 5 seconds, measure time
For both, use absolute times.
Do we support migration to a different host? We should, especially in
this test too. The destination host reading should also be used.
Apart for that, good patchset, and good thing you refactored some of
the code to shared utils.
We don't, and it would be very messy to implement with the framework
right now. We should probably do that as some sort of server side test,
but we don't have server side tests right now, so doing it may take a
little time and effort. I got the impression that there are more
important things to do at the moment, but please correct me if I'm wrong.
It is needed and will help us checking migration from one physical cpu
to another, different host versions and of course the time gap potential
bug upon migration.
Other than this concern that came to my mind, the new tests look
good
and work fine here. I had to do a slight rebase in one of the
patches,
very minor stuff. The default values and the drift calculation can
be
changed on a later time. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html