Hi, Sriharsha Allenki <sallenki@xxxxxxxxxxxxxx> writes: >>>> what problem you actually found? Preferrably with tracepoint data >>>> showing the fault. >>> Test case here involves f_fs driver in AIO mode and we see ~8 TRBs in >>> the queue with HWO set and UPDATE_XFER done. In the failure case I see >>> thatas part of processingthe interrupt generated by the core for the >>> completion of the first TRB, the driver isgoing ahead and giving >> we shouldn't get completion interrupt for the first TRB, only the >> last. Care to share tracepoint data? > > We have seen the issue only once and we do not have any tracepoint > data for it. But with the internal logging we have in our downstream code, > I see a race between dequeue from the function driver, and the giveback > as part of the completion (XferInProgress). Which other changes do you have in your downstream code? Could this problem be caused by some of the changes in your downstream tree? > A request (say Request-1) is dequeued before we could notify it's > completion to the gadget driver. Because of this, as part of handling > the completion event for the Request-1 we gaveback the next > request(Request-2) in the queue which is yet to be processed by the > core leading to the mentioned SMMU fault. I really need to see tracepoint of this happening. Every list modification happens with locks held. > Normally, the core should not process the TRBs once a request > has been dequeued because of the stop_active_transfer as part of > dequeue, but I see a timeout when issuing the end transfer command > during dequeue because of which core is still processing the TRBs > in the queue. Ok, so that's the real problem. End Transfer times out. Are you fixing the wrong thing? Please, collect trace point data with UPSTREAM kernel. You can't report a bug on a downstream kernel without reproducing it in the upstream; otherwise we will be running in circles here. -- balbi