On 09/19/2017 02:05 PM, James Bottomley wrote: > Actually, the whole problem sounds like a posted write. Likely the > write that causes the reset doesn't get flushed until the read checking > if the reset has succeeded, which might explain the 100% initial > failure. Why not throw away that first value if it's a failure and > then do your polled wait and timeout on the reset success. We should > anyway be waiting some time for a reset to be issued, so even on non- > posted write systems we could see this problem intermittently. > > James > Thanks for this suggestion James. I tried to remove the sleep and did a dummy read to register using readl() - issue reproduced. I did expect that, since in aac_is_ctrl_up_and_running() we indeed read a register and if it shows us reset is not complete, we wait and read it again. So, we can think in this 1st read as a dummy one heheh My theory here is that we're observing a failure similar to one we already did in some specific NVMe adapters - the readl before some delay (in nvme it was 2s) corrupts the adapter FW procedure. It's as if the adapter doesn't like to deal with this read while the reset procedure is ongoing. So, we wait a bit to issue a readl and everything goes fine. Cheers, Guilherme