On 5/26/20 5:55 AM, Cornelia Huck wrote: > On Wed, 13 May 2020 16:29:30 +0200 > Eric Farman <farman@xxxxxxxxxxxxx> wrote: > >> There was some suggestion earlier about locking the FSM, but I'm not >> seeing any problems with that. Rather, what I'm noticing is that the >> flow between a synchronous START and asynchronous HALT/CLEAR have >> different impacts on the FSM state. Consider: >> >> CPU 1 CPU 2 >> >> SSCH (set state=CP_PENDING) >> INTERRUPT (set state=IDLE) >> CSCH (no change in state) >> SSCH (set state=CP_PENDING) >> INTERRUPT (set state=IDLE) >> INTERRUPT (set state=IDLE) > > A different question (not related to how we want to fix this): How > easily can you trigger this bug? Is this during normal testing with a > bit of I/O stress, or do you have a special test case? > I have hit this with "normal testing with a bit of I/O stress" but it's been maddeningly slow to repro (invariably when I'm not running with any detailed traces enabled). So I expedite the process with the channel path handling code, and this script running on the host: while True: tempChpid = random.choice(chpids) tempFunction = random.choice(["-c", "-v"]) doChzdev(tempFunction, "0", tempChpid) doSleep() doChzdev(tempFunction, "1", tempChpid) doSleep()