RE: [PATCH 1/5] xhci: Don't change the ownership of LINK TRB until all the TRB are written

David Laight <David.Laight@xxxxxxxxxx> · Wed, 29 Jan 2014 15:07:53 +0000

From: Sarah Sharp 
> Hi David,
> 
> I've been thinking about this some more, and I'd like to propose a much
> simpler solution.
> 
> The TD fragment rules didn't go into the xHCI specification until the
> 1.0 revision.  The code that follows those rules only seems to trigger
> issues for 0.96 ASMedia hosts, while Intel 1.0 hosts benefit from the
> code when USB ethernet devices are attached.  The patch that changed the
> link TRB behavior (commit 6c12db90f19727c76990e7f4801c67a148b30111) was
> submitted in 2010, and the xHCI 1.0 spec didn't come out until later
> that year, so it's unlikely that Synopsys host is a 1.0 host either.
> 
> So why not just limit your code (and the sg table entry limit) to only
> trigger for 1.0 hosts?  That fix would be much less complex than this
> one.  I would still consider the other cleanup patches on to of it for
> inclusion in 3.15, although it looks like patches 4 and 5 rely on this
> patch.

My suspicion is that it is possible to lock up 0.96 ASMedia hosts (or
other hosts that need the link TRB behaviour change) without my patch
that adds NOPs to the ring.

It will happen much less often than when the NOPs are added, but it
can still happen.

If a TD starts with a LINK TRB then the LINK TRB is the last one that is
owned by the controller until the setup completes. If the controller
finishes the previous transfer while the while the setup is still in
progress then it will process the LINK TRB and we are in exactly
the same situation as before patch 6c12db90f1.
This actually means that patch 6c12db90f1 didn't completely fix the problem.

With the NOPs, the driver writes the first NOP at about the same time
as the previous version would have written the LINK. The rest of the NOPs
and the LINK are then probably written faster than the controller reads them.
So I surmise that if the patch wrote a LINK instead of the first NOP
then the controller would hang just as often.
But this is the same as the case when no NOPs are written because the
previous TD finished in the last slot.

I know this is all failing with disk transfers which I believe are all
single threaded. So maybe the LINK TRB that the controller is reading
is one written in response to a previous transfer completing.
This is a timing race which could be affected by small things (like
putting the controller into a different computer).
But I can't see how it is affected by the number of NOPs (including zero).

A patch that only adds the NOPs for transfers that are expected to be
misaligned (or are not known to be aligned - depending on which way
you want to flag things) is ok by me.
That would allow unlimited length aligned sg transfers on v1.00 hosts.

	David

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html