Sven Schnelle <svens@xxxxxxxxxxxxx> writes: > Hi, > > David Hildenbrand <david@xxxxxxxxxx> writes: > >> On 04.05.22 09:37, Janosch Frank wrote: >>> I had a short look yesterday and the boot usually hangs in the raid6 >>> code. Disabling vector instructions didn't make a difference but a few >>> interruptions via GDB solve the problem for some reason. >>> >>> CCing David and Thomas for TCG >>> >> >> I somehow recall that KASAN was always disabled under TCG, I might be >> wrong (I thought we'd get a message early during boot that the HW >> doesn't support KASAN). >> >> I recall that raid code is a heavy user of vector instructions. >> >> How can I reproduce? Compile upstream (or -next?) with kasan support and >> run it under TCG? > > I spent some time looking into this. It's usually hanging in > s390vx8_gen_syndrome(). My first thought was that it is a problem with > the VX instructions, but turned out that it hangs even if i remove all > the code from s390vx8_gen_syndrome(). > > Tracing the execution of TB's, i see that the generated code is always > jumping between a few TB's, but never exiting the TB's to check for > interrupts (i.e. return to cpu_tb_exec(). I only see calls to > helper_lookup_tb_ptr to lookup the tb pointer for the next TB. > > The raid6 code is waiting for some time to expire by reading jiffies, > but interrupts are never processed and therefore jiffies doesn't change. > So the raid6 code hangs forever. > > As a test, i made a quick change to test: > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > index c997c2e8e0..35819fd5a7 100644 > --- a/accel/tcg/cpu-exec.c > +++ b/accel/tcg/cpu-exec.c > @@ -319,7 +319,8 @@ const void *HELPER(lookup_tb_ptr)(CPUArchState *env) > cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags); > > cflags = curr_cflags(cpu); > - if (check_for_breakpoints(cpu, pc, &cflags)) { > + if (check_for_breakpoints(cpu, pc, &cflags) || > + unlikely(qatomic_read(&cpu->interrupt_request))) { > cpu_loop_exit(cpu); > } > > And that makes the problem go away. But i'm not familiar with the TCG > internals, so i can't say whether the generated code is incorrect or > something else is wrong. I have tcg log files of a failing + working run > if someone wants to take a look. They are rather large so i would have to > upload them somewhere. Whatever is setting cpu->interrupt_request should be calling cpu_exit(cpu) which sets the exit flag which is checked at the start of every TB execution (see gen_tb_start). -- Alex Bennée