Hi Mark, On Wed, Jul 12, 2023 at 12:02:30PM +0100, Mark Brown wrote: > When we collect a signal context with one of the SME modes enabled we will > have enabled that mode behind the compiler and libc's back so they may > issue some instructions not valid in streaming mode, causing spurious > failures. > > For the code prior to issuing the BRK to trigger signal handling we need to > stay in streaming mode if we were already there since that's a part of the > signal context the caller is trying to collect. Unfortunately this code > includes a memset() which is likely to be heavily optimised and is likely > to use FP instructions incompatible with streaming mode. We can avoid this > happening by open coding the memset(), inserting a volatile assembly > statement to avoid the compiler recognising what's being done and doing > something in optimisation. This code is not performance critical so the > inefficiency should not be an issue. > > After collecting the context we can simply exit streaming mode, avoiding > these issues. Use a full SMSTOP for safety to prevent any issues appearing > with ZA. Thanks for looking at this. Comments inline. > diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h > index 222093f51b67..db28409fd44b 100644 > --- a/tools/testing/selftests/arm64/signal/test_signals_utils.h > +++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h > @@ -60,13 +60,28 @@ static __always_inline bool get_current_context(struct tdescr *td, > size_t dest_sz) > { > static volatile bool seen_already; > + int i; > + char *uc = (char *)dest_uc; > > assert(td && dest_uc); > /* it's a genuine invocation..reinit */ > seen_already = 0; > td->live_uc_valid = 0; > td->live_sz = dest_sz; > - memset(dest_uc, 0x00, td->live_sz); > + > + /* > + * This is a memset() but we don't want the compiler to > + * optimise it into either instructions or a library call > + * which might be incompatible with streaming mode. > + */ > + for (i = 0; i < td->live_sz; i++) { > + asm volatile("nop" > + : "+m" (*dest_uc) > + : > + : "memory"); I don't think it's save to use "+m" here, since the compiler can assume that the address is used exactly once in the asm. If a post-indexed addressing mode is generated, then you can end up with register corruption. Stepping back, why not use either barrier() or OPTIMIZER_HIDE_VAR() instead? The most robust fix would be to write all of the streaming mode code in asm, but I can appreciate that's a tonne of work for a testcase. Will