On Wed, Oct 27, 2021 at 04:59:11PM +0300, Mathias Nyman wrote: > > This patch passes my test. Can you please clarify the below question that is > > bothering me. > > Sure, currently this works as we only have one command ring segment. > But lets look at a theoretical case with two segments far apart where > the last command on a segment is aborted > > > > > Here crcr points to the DMA address of the command which is getting > > executed (soon will be aborted) by the xHC. After the ring is stopped, > > we want xHC to execute the *next* command. > > > > Is it guaranteed that the upper 32 bits of previous and next commands will > > be same? Because when the issue happens, xHC takes the 32 bits of crcr > > variable and update it's internal pointer from which it will fetch the commands > > next time the ring is started. I think it is guaranteed because the upper 32 > > bits only may change when we cross the segments but in which case there will > > be a link TRB in the middle. Since xHC command ring is stopped, it won't be > > fetching the link TRB until door bell is rung again. Is my understanding correct? > > Good point. > > This could be an issue as xHC will move the internal pointer to the next > command TRB (past link) right after generating a "command aborted" event, > before generating the "command ring stopped" event. The command ring stopped > event should thus point to the next command TRB to handle. > > I'll assume that the "command aborted" and "command ring stopped" events are generated > before xHC updates the internal pointer based on CRCR write. > > The CRCR write may thus move the internal dequeue pointer back to previous segment. > > This may mess things up, even if the interrupt handler turns the aborted command TRB > to a no-op, it's possible the partial CRCR update could cause internal pointer to > point to first command of old segment. > > > > > Also, what if there is a race with xHC just finishing this command (which we > > are currently aborting) and next link TRB is traversed and processing the next > > command in a different segment. For some reason, we could not update our > > deq pointer and in middle of aborting the command (which is already completed) > > and updating the higher 32 bit with the previous deq segment. This is a > > hypothetical question as we are only using 1 segment for the command ring. > > In this case we won't see the "command aborted" event. Only a "command ring > stopped" event. and yes, the CRCR write could cause the internal dequeue pointer > to be moved back to the old segment, and we have the same issues as in the > previous case. > > These two cases could probably solved by writing the next command pointer to CRCR > instead. > > Only problem I see with this is if xHC hasn't even fetched the command TRB we are > aborting. Then ring will stop at this last command TRB on segment. > Writing the next command TRB pointer to CRCR forces xHC to move to the next > segment without xHC processing the Link TRB, and cycle bit might get out of sync. > > Hard to imagine how we end up in this situation. > > Then there is the third case where xHC handled multiple command TRBs, but interrupts > are blocked so driver is unaware of their completion. > Here its possible we write the wrong segment to CRCR even if we use the next > command TRB pointer. xHC could be way past what driver thinks is the next command. > > For this case we proably need to check for pending interrupts and unhandled command > completions on the event ring before aborting the command. > But that is probably another patch. > > But for now the best solution seems to be just to write next command TRB > pointer to CRCR during abort. > Thanks Mathias for the detailed explanation. Agree that your patch is good enough for the current situation. May be a TODO comment here would help. Thanks, Pavan