Re: [kvm-unit-tests PATCH v3 2/2] s390x: Test specification exceptions during transaction

Janis Schoetterl-Glausch <scgl@xxxxxxxxxxxxxxxxxx> · Wed, 27 Oct 2021 12:05:04 +0200



On 10/26/21 16:55, Claudio Imbrenda wrote:
> On Tue, 26 Oct 2021 16:22:40 +0200
> Janis Schoetterl-Glausch <scgl@xxxxxxxxxxxxxxxxxx> wrote:
> 
>> On 10/25/21 19:30, Claudio Imbrenda wrote:
>>> On Fri, 22 Oct 2021 14:01:56 +0200
>>> Janis Schoetterl-Glausch <scgl@xxxxxxxxxxxxx> wrote:
>>>   
>>>> Program interruptions during transactional execution cause other
>>>> interruption codes.
>>>> Check that we see the expected code for (some) specification exceptions.
>>>>
>>>> Signed-off-by: Janis Schoetterl-Glausch <scgl@xxxxxxxxxxxxx>
>>>> ---  
>>
>> [...]
>>
>>>> +#define TRANSACTION_MAX_RETRIES 5
>>>> +
>>>> +/* NULL must be passed to __builtin_tbegin via constant, forbid diagnose from
>>>> + * being NULL to keep things simple
>>>> + */
>>>> +static int __attribute__((nonnull))
>>>> +with_transaction(void (*trigger)(void), struct __htm_tdb *diagnose)
>>>> +{
>>>> +	int cc;
>>>> +  
>>>
>>> if you want to be extra sure, put an assert here (although I'm not sure
>>> how nonnull works, I have never seen it before)  
>>
>> Ok, with nonnull, the compiler might warn you if you pass NULL.
> 
> fair enough
> 
>>>   
>>>> +	cc = __builtin_tbegin(diagnose);
>>>> +	if (cc == _HTM_TBEGIN_STARTED) {
>>>> +		trigger();
>>>> +		__builtin_tend();
>>>> +		return -TRANSACTION_COMPLETED;
>>>> +	} else {
>>>> +		return -cc;
>>>> +	}
>>>> +}
>>>> +
>>>> +static int retry_transaction(const struct spec_ex_trigger *trigger, unsigned int max_retries,
>>>> +			     struct __htm_tdb *tdb, uint16_t expected_pgm)
>>>> +{
>>>> +	int trans_result, i;
>>>> +	uint16_t pgm;
>>>> +
>>>> +	for (i = 0; i < max_retries; i++) {
>>>> +		expect_pgm_int();
>>>> +		trans_result = with_transaction(trigger->func, tdb);
>>>> +		if (trans_result == -_HTM_TBEGIN_TRANSIENT) {
>>>> +			mb();
>>>> +			pgm = lc->pgm_int_code;
>>>> +			if (pgm == 0)
>>>> +				continue;
>>>> +			else if (pgm == expected_pgm)
>>>> +				return 0;
>>>> +		}
>>>> +		return trans_result;
>>>> +	}
>>>> +	return -TRANSACTION_MAX_RETRIES;  
>>>
>>> so this means that a test will be considered failed if the transaction
>>> failed too many times?  
>>
>> Yes.
>>>
>>> this means that could fail if the test is run on busy system, even if
>>> the host running the unit test is correct  
>>
>> I suppose so, don't know how likely that is.
> 
> I don't like the idea of failing a test when the implementation is
> correct, just because the system might be a little more busy than
> expected.

Fair enough, I'll see what I can do.
> 
> if you can't find a way to refactor the test so that it doesn't fail if
> there are too many retries, then at least make it a skip?
> 
> but I'd really like to see something that does not fail on a correctly
> implemented system just because the test machine was too busy.
> 
>>>
>>> also, do you really need to use negative values? it's probably easier
>>> to read if you stick to positive values, and less prone to mistakes if
>>> you accidentally forget a - somewhere.  
>>
>> Ok.
>>>   
>>>> +}
>>>> +
>>>> +static void test_spec_ex_trans(struct args *args, const struct spec_ex_trigger *trigger)
>>>> +{
>>>> +	const uint16_t expected_pgm = PGM_INT_CODE_SPECIFICATION
>>>> +			      | PGM_INT_CODE_TX_ABORTED_EVENT;
>>>> +	union {
>>>> +		struct __htm_tdb tdb;
>>>> +		uint64_t dwords[sizeof(struct __htm_tdb) / sizeof(uint64_t)];
>>>> +	} diag;
>>>> +	unsigned int i, failures = 0;
>>>> +	int trans_result;
>>>> +
>>>> +	if (!test_facility(73)) {
>>>> +		report_skip("transactional-execution facility not installed");
>>>> +		return;
>>>> +	}
>>>> +	ctl_set_bit(0, CTL0_TRANSACT_EX_CTL); /* enable transactional-exec */
>>>> +
>>>> +	for (i = 0; i < args->iterations && failures <= args->max_failures; i++) {
>>>> +		register_pgm_cleanup_func(trigger->fixup);
>>>> +		trans_result = retry_transaction(trigger, args->max_retries, &diag.tdb, expected_pgm);  
>>>
>>> so you retry each iteration up to args->max_retries times, and if a
>>> transaction aborts too many times (maybe because the host system is
>>> very busy), then you consider it a fail
>>>   
>>
>> [...]
>