On Thu, Jul 20, 2023 at 07:42:09PM +0100, Mark Brown wrote: > When we collect a signal context with one of the SME modes enabled we will > have enabled that mode behind the compiler and libc's back so they may > issue some instructions not valid in streaming mode, causing spurious > failures. > > For the code prior to issuing the BRK to trigger signal handling we need to > stay in streaming mode if we were already there since that's a part of the > signal context the caller is trying to collect. Unfortunately this code > includes a memset() which is likely to be heavily optimised and is likely > to use FP instructions incompatible with streaming mode. We can avoid this > happening by open coding the memset(), inserting a volatile assembly > statement to avoid the compiler recognising what's being done and doing > something in optimisation. This code is not performance critical so the > inefficiency should not be an issue. > > After collecting the context we can simply exit streaming mode, avoiding > these issues. Use a full SMSTOP for safety to prevent any issues appearing > with ZA. > > Reported-by: Will Deacon <will@xxxxxxxxxx> > Signed-off-by: Mark Brown <broonie@xxxxxxxxxx> > --- > Changes in v3: > - Open code OPTIMISER_HIDE_VAR() instead of the memory clobber. > - Link to v2: https://lore.kernel.org/r/20230712-arm64-signal-memcpy-fix-v2-1-494f7025caf6@xxxxxxxxxx > > Changes in v2: > - Rebase onto v6.5-rc1. > - Link to v1: https://lore.kernel.org/r/20230628-arm64-signal-memcpy-fix-v1-1-db3e0300829e@xxxxxxxxxx > --- > .../selftests/arm64/signal/test_signals_utils.h | 25 +++++++++++++++++++++- > 1 file changed, 24 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h > index 222093f51b67..c7f5627171dd 100644 > --- a/tools/testing/selftests/arm64/signal/test_signals_utils.h > +++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h > @@ -60,13 +60,25 @@ static __always_inline bool get_current_context(struct tdescr *td, > size_t dest_sz) > { > static volatile bool seen_already; > + int i; > + char *uc = (char *)dest_uc; > > assert(td && dest_uc); > /* it's a genuine invocation..reinit */ > seen_already = 0; > td->live_uc_valid = 0; > td->live_sz = dest_sz; > - memset(dest_uc, 0x00, td->live_sz); > + > + /* > + * This is a memset() but we don't want the compiler to > + * optimise it into either instructions or a library call > + * which might be incompatible with streaming mode. > + */ > + for (i = 0; i < td->live_sz; i++) { > + uc[i] = 0; > + __asm__ ("" : "=r" (uc[i]) : "0" (uc[i])); > + } Looking more closely at this, I see that the bpf and kvm selftests are able to include <linux/compiler.h> directly, so I don't see why we need to open-code this here. I also spotted that we've *already* written our own version of this as the 'curse()' macro in selftests/arm64/bti/compiler.h! So I think we should either use linux/compiler.h or make our curse macro usable to all the arm64 selftests. What do you think? Will