Re: [kvm-unit-tests PATCH 1/2] x86: access: Fix timeout failure by limiting number of flag combinations

Babu Moger <babu.moger@xxxxxxx> · Tue, 10 Aug 2021 18:38:27 -0500

On 8/10/21 11:59 AM, Babu Moger wrote:
> 
> 
> On 8/9/21 2:43 PM, Babu Moger wrote:
>>
>>
>> On 8/6/21 11:53 AM, Sean Christopherson wrote:
>>> On Fri, Aug 06, 2021, Babu Moger wrote:
>>>> From: Babu Moger <Babu.Moger@xxxxxxx>
>>>>
>>>> The test ./x86/access fails with a timeout. This is due to the number test
>>>> combination. The test cases increase exponentially as the features get
>>>> enabled. The new machine adds the feature AC_CPU_CR4_PKE. The default
>>>> timeout is 180 seconds. Seen this problem both on AMD and Intel machines.
>>>>
>>>> #./tests/access
>>>> qemu-system-x86_64: terminating on signal 15 from pid 20050 (timeout)
>>>> FAIL access (timeout; duration=180)
>>>>
>>>> This test can take about 7 minutes without timeout.
>>>> time ./tests/access
>>>> 58982405 tests, 0 failures
>>>> PASS access
>>>>
>>>> real	7m10.063s
>>>> user	7m9.063s
>>>> sys	0m0.309s
>>>>
>>>> Fix the problem by adding few more limit checks.
>>>
>>> Please state somewhere in the changelog what is actually being changed, and the
>>> actual effect of the change.  E.g.
>>>
>>>   Disallow protection keys testcase in combination with reserved bit
>>>   testcasess to further limit the number of tests run to avoid timeouts on
>>>   systems with support for protection keys.
>>>
>>>   Disallowing this combination reduces the total number of tests from
>>>   58982405 to <???>, and the runtime from ~7 minutes to <???>
>>
>> Sure. Will do.
>>>
>>>> Signed-off-by: Babu Moger <Babu.Moger@xxxxxxx>
>>>> ---
>>>>  x86/access.c |    4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/x86/access.c b/x86/access.c
>>>> index 47807cc..e371dd5 100644
>>>> --- a/x86/access.c
>>>> +++ b/x86/access.c
>>>> @@ -317,9 +317,9 @@ static _Bool ac_test_legal(ac_test_t *at)
>>>>      /*
>>>>       * Shorten the test by avoiding testing too many reserved bit combinations
>>>>       */
>>>> -    if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13)) > 1)
>>>> +    if ((F(AC_PDE_BIT51) + F(AC_PDE_BIT36) + F(AC_PDE_BIT13) + F(AC_CPU_CR4_PKE)) > 1)
>>>>          return false;
>>>> -    if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36)) > 1)
>>>> +    if ((F(AC_PTE_BIT51) + F(AC_PTE_BIT36) + F(AC_CPU_CR4_PKE)) > 1)
>>>
>>> Why are protection keys the sacrifical lamb?  Simply because they're the newest?
>>
>> Yes. I added it because it was new ):.
>>>
>>> And before we start killing off arbitrary combinations, what about sharding the
>>> test so that testcases that depend on a specific CR0/CR4/EFER bit, i.e. trigger
>>> a VM-Exit when the configuration changes, are separate runs?  Being able to run
>>> a specific combination would also hopefully make it easier to debug issues as
>>> the user could specify which combo to run without having to modify the code and
>>> recompile.
>>>
>>> That probably won't actually reduce the total run time, but it would make each
>>> run a separate test and give developers a warm fuzzy feeling that they're indeed
>>> making progress :-)
>>>
>>> Not sure how this could be automagically expressed this in unittest.cfg though...
> 
> As we know now that we cannot run a huge number of tests without running
> into timeout, I was thinking of adding a extra parameter "max_runs" for
> these tests and add a check in ac_test_run to limit the number of runs.
> The max_runs will be set to default 10000000. But it can be changed in
> unittests.cfg. Something like this.
> 
> [access]
>  file = access.flat
>  arch = x86_64
> -extra_params = -cpu max
> +extra_params = -cpu max -append 10000000

No. This will not work. The PKU feature flag is bit 30. That is 2^30
iterations to cover the tests for this feature. Looks like I need to split
the tests into PKU and non PKU tests. For PKU tests I may need to change
the bump frequency (in ac_test_bump_one) to much higher value. Right now,
it is 1. Let me try that,

>  timeout = 180
> 
> Thoughts?
> 
>>
>> Let me investigate if we can do that fairly easy. Will let you know.
>> Thanks
>> Babu
>>>
>>>>          return false;
>>>>  
>>>>      return true;
>>>>