Re: [kvm-unit-tests PATCH 1/2] x86: access: Fix timeout failure by limiting number of flag combinations

Babu Moger <babu.moger@xxxxxxx> · Tue, 10 Aug 2021 11:59:38 -0500

On 8/9/21 2:43 PM, Babu Moger wrote:
> 
> 
> On 8/6/21 11:53 AM, Sean Christopherson wrote:
>> On Fri, Aug 06, 2021, Babu Moger wrote:
>>> From: Babu Moger <Babu.Moger@xxxxxxx>
>>>
>>> The test ./x86/access fails with a timeout. This is due to the number test
>>> combination. The test cases increase exponentially as the features get
>>> enabled. The new machine adds the feature AC_CPU_CR4_PKE. The default
>>> timeout is 180 seconds. Seen this problem both on AMD and Intel machines.
>>>
>>> #./tests/access
>>> qemu-system-x86_64: terminating on signal 15 from pid 20050 (timeout)
>>> FAIL access (timeout; duration=180)
>>>
>>> This test can take about 7 minutes without timeout.
>>> time ./tests/access
>>> 58982405 tests, 0 failures
>>> PASS access
>>>
>>> real	7m10.063s
>>> user	7m9.063s
>>> sys	0m0.309s
>>>
>>> Fix the problem by adding few more limit checks.
>>
>> Please state somewhere in the changelog what is actually being changed, and the
>> actual effect of the change.  E.g.
>>
>>   Disallow protection keys testcase in combination with reserved bit
>>   testcasess to further limit the number of tests run to avoid timeouts on
>>   systems with support for protection keys.
>>
>>   Disallowing this combination reduces the total number of tests from
>>   58982405 to <???>, and the runtime from ~7 minutes to <???>
> 
> Sure. Will do.
>>
>>> Signed-off-by: Babu Moger <Babu.Moger@xxxxxxx>
>>> ---
>>>  x86/access.c |    4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/x86/access.c b/x86/access.c
>>> index 47807cc..e371dd5 100644
>>> --- a/x86/access.c
>>> +++ b/x86/access.c
>>> @@ -317,9 +317,9 @@ static _Bool ac_test_legal(ac_test_t *at)
>>>      /*
>>>       * Shorten the test by avoiding testing too many reserved bit combinations
>>>       */
>>> -    if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13)) > 1)
>>> +    if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13) + F(AC_CPU_CR4_PKE)) > 1)
>>>          return false;
>>> -    if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1)
>>> +    if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36) + F(AC_CPU_CR4_PKE)) > 1)
>>
>> Why are protection keys the sacrifical lamb?  Simply because they're the newest?
> 
> Yes. I added it because it was new ):.
>>
>> And before we start killing off arbitrary combinations, what about sharding the
>> test so that testcases that depend on a specific CR0/CR4/EFER bit, i.e. trigger
>> a VM-Exit when the configuration changes, are separate runs?  Being able to run
>> a specific combination would also hopefully make it easier to debug issues as
>> the user could specify which combo to run without having to modify the code and
>> recompile.
>>
>> That probably won't actually reduce the total run time, but it would make each
>> run a separate test and give developers a warm fuzzy feeling that they're indeed
>> making progress :-)
>>
>> Not sure how this could be automagically expressed this in unittest.cfg though...

As we know now that we cannot run a huge number of tests without running
into timeout, I was thinking of adding a extra parameter "max_runs" for
these tests and add a check in ac_test_run to limit the number of runs.
The max_runs will be set to default 10000000. But it can be changed in
unittests.cfg. Something like this.

[access]
 file = access.flat
 arch = x86_64
-extra_params = -cpu max
+extra_params = -cpu max -append 10000000
 timeout = 180

Thoughts?

> 
> Let me investigate if we can do that fairly easy. Will let you know.
> Thanks
> Babu
>>
>>>          return false;
>>>  
>>>      return true;
>>>