Re: [PATCH 3/3] xHCI: dynamic ring expansion

Sarah Sharp <sarah.a.sharp@xxxxxxxxxxxxxxx> · Mon, 26 Sep 2011 09:00:29 -0700

On Mon, Sep 26, 2011 at 05:41:05PM +0800, Andiry Xu wrote:
> If room_on_ring() check fails, try to expand the ring and check again.
> 
> When expand a ring, try to find a ring with the same number of segments
> in the cached ring; if not found, allocate a new ring with the same
> cycle state and segment numbers. Link the two rings, update the old
> ring's segment numbers and the last segment pointer.

Why are you using two xhci_ring structures to expand the ring?  It seems
really inefficient to allocate an xhci_ring structure, link the segments
from the two rings together, and then just free the xhci_ring structure.
What if something else needs to be allocated in xhci_ring structure in
the future, and the person expects xhci_free_ring to be called to have
it deallocated?  By just kfreeing the xhci_ring and not its segments
you're breaking the pattern of allocation and deallocation for an
existing structure.

Why not just allocate more segments for the current ring?  You would
just have to manipulate the linked list of ring segments, which
shouldn't be too hard.

There's another issue too:

> When room_on_ring check fails, try ring expansion until the ring has
> more than 256 segments.
> 
> Signed-off-by: Andiry Xu <andiry.xu@xxxxxxx>
> ---
>  drivers/usb/host/xhci-mem.c  |   82 ++++++++++++++++++++++++++++++++++++++----
>  drivers/usb/host/xhci-ring.c |   38 ++++++++++++++------
>  drivers/usb/host/xhci.h      |    3 ++
>  3 files changed, 105 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
> index 018f825..10f9f91 100644
> --- a/drivers/usb/host/xhci-mem.c
> +++ b/drivers/usb/host/xhci-mem.c
> @@ -115,6 +115,32 @@ static void xhci_link_segments(struct xhci_hcd *xhci, struct xhci_segment *prev,
>  			(unsigned long long)next->dma);
>  }
>  
> +/*
> + * Link the prev ring to the next ring.
> + *
> + * Link the last link TRB of prev ring to the first segment of next ring;
> + * link the last link TRB of next ring to the first segment of prev ring;
> + * Clear Toggle Cycle for the prev ring.
> + */
> +static void xhci_link_rings(struct xhci_hcd *xhci, struct xhci_ring *prev,
> +		struct xhci_ring *next, bool link_trbs, bool isoc)
> +{
> +	if (!prev || !next)
> +		return;
> +
> +	xhci_link_segments(xhci, prev->last_seg, next->first_seg,
> +				link_trbs, isoc);
> +	xhci_link_segments(xhci, next->last_seg, prev->first_seg,
> +				link_trbs, isoc);
> +	prev->num_segs += next->num_segs;
> +
> +	if (link_trbs) {
> +		prev->last_seg->trbs[TRBS_PER_SEGMENT-1].link.control
> +			&= ~cpu_to_le32(LINK_TOGGLE);

I don't think you can blindly assume you can link the last segment of
the current ring with the new ring segments.  What if you've already
wrapped around the last segment when room_on_ring fails?

For example, say you have a one-segment ring, and someone queues a 61-TRB TD,
which completes before they can submit any more TDs.  Then they try to queue
two 61-TRB TDs in a row (let's call them TD A and B).  For TD A, room_on_ring()
succeeds, the TD is queued, and the link TRB at the end of the segment is given
to the hardware (the cycle bit is toggled).  Then room_on_ring() fails for TD B.

Now the ring will look like this:
 -------------------
| TD A, TRB 4  - CH | <-------
 -------------------         |
| TD A, TRB 5       |        |
 -------------------         |
| ...               |        |
 -------------------         |
| TD A, TRB 61      |        |
 -------------------         |
|                   |        |
 -------------------         |
|                   |        |
 -------------------         |
| TD A, TRB 1  - CH |        |
 -------------------         |
| TD A, TRB 2  - CH |        |
 -------------------         |
| TD A, TRB 3       |        |
 -------------------         |
| LINK TRB          | --------
 -------------------

At that point, you've given control the link TRB to the hardware by
toggling the cycle bit.  If you try to overwrite a link TRB that's given
over to the hardware, one of two things will happen:

1. The hardware has already cached the link TRB, and the write will
never be noticed.  It will just go back to the top of the original
segment after processing TD A, TRB 3.  But adding that new segment won't help
you enqueue TD B.

2. The hardware hasn't cached the link TRB and it will go fetch TRBs in
the new segment.  If the new segments have the cycle bit set to software
owned (which I think you do that in this patch), then hardware will stop
on the first TRB of the new segment (since it doesn't own that TRB) and
fail to execute the rest of TD A.  Then the cancellation code will run,
and goodness knows what it will make of the ring structure and where the
hardware thinks it is stopped.

So you can't just blindly assume you'll always be expanding the ring
before you wrap around the link TRB at the end of the segment.  I would
highly recommend you not pursue the option of adding link TRBs in the
middle of the segment, as none of the ring walking code, cancellation
code, or enqueuing code is set up to handle that.  Because adding link
TRBs in the middle would be such a radical change, I think you really
need pre-emptive ring expansion before overwriting link TRBs.

<snip>
> -	if (!room_on_ring(xhci, ep_ring, num_trbs)) {
> -		/* FIXME allocate more room */
> -		xhci_err(xhci, "ERROR no room on ep ring\n");
> -		return -ENOMEM;
> -	}
> +
> +	while (1) {
> +		if (room_on_ring(xhci, ep_ring, num_trbs))
> +			break;
> +
> +		if (ep_ring->num_segs >= 256) {
> +			xhci_err(xhci, "ERROR no room on ring "
> +						"after ring expansion\n");
> +			return -ENOMEM;
> +		}

Please don't place arbitrary limits on ring expansion like this.  I've
heard from at least one customer that they'll may need ring segments in the
thousands, so I've stopped assuming I know what ring size is "too big".
Just let the kernel run out of memory if drivers attempt to queue too
many big transfers.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html