Re: [PATCH 3/3] xHCI: dynamic ring expansion

Andiry Xu <andiry.xu@xxxxxxx> · Tue, 27 Sep 2011 13:00:44 +0800

On Mon, 2011-09-26 at 09:00 -0700, Sarah Sharp wrote:
> On Mon, Sep 26, 2011 at 05:41:05PM +0800, Andiry Xu wrote:
> > If room_on_ring() check fails, try to expand the ring and check again.
> > 
> > When expand a ring, try to find a ring with the same number of segments
> > in the cached ring; if not found, allocate a new ring with the same
> > cycle state and segment numbers. Link the two rings, update the old
> > ring's segment numbers and the last segment pointer.
> 
> Why are you using two xhci_ring structures to expand the ring?  It seems
> really inefficient to allocate an xhci_ring structure, link the segments
> from the two rings together, and then just free the xhci_ring structure.
> What if something else needs to be allocated in xhci_ring structure in
> the future, and the person expects xhci_free_ring to be called to have
> it deallocated?  By just kfreeing the xhci_ring and not its segments
> you're breaking the pattern of allocation and deallocation for an
> existing structure.
> 
> Why not just allocate more segments for the current ring?  You would
> just have to manipulate the linked list of ring segments, which
> shouldn't be too hard.
> 

When expand a ring, I try to use a cached ring first; if a suitable
cached ring is available, the ring structure needs to be freed anyway.
If I need to allocate new segments, use xhci_ring_alloc() is simple and
makes the code look cleaner, I don't need to call segment alloc and link
functions directly.

Maybe factor out the segment allocation and linking part in
xhci_ring_alloc() as a single function is a good idea, so you do not
need to duplicate the code.

> There's another issue too:
> 
> > When room_on_ring check fails, try ring expansion until the ring has
> > more than 256 segments.
> > 
> > Signed-off-by: Andiry Xu <andiry.xu@xxxxxxx>
> > ---
> >  drivers/usb/host/xhci-mem.c  |   82 ++++++++++++++++++++++++++++++++++++++----
> >  drivers/usb/host/xhci-ring.c |   38 ++++++++++++++------
> >  drivers/usb/host/xhci.h      |    3 ++
> >  3 files changed, 105 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
> > index 018f825..10f9f91 100644
> > --- a/drivers/usb/host/xhci-mem.c
> > +++ b/drivers/usb/host/xhci-mem.c
> > @@ -115,6 +115,32 @@ static void xhci_link_segments(struct xhci_hcd *xhci, struct xhci_segment *prev,
> >  			(unsigned long long)next->dma);
> >  }
> >  
> > +/*
> > + * Link the prev ring to the next ring.
> > + *
> > + * Link the last link TRB of prev ring to the first segment of next ring;
> > + * link the last link TRB of next ring to the first segment of prev ring;
> > + * Clear Toggle Cycle for the prev ring.
> > + */
> > +static void xhci_link_rings(struct xhci_hcd *xhci, struct xhci_ring *prev,
> > +		struct xhci_ring *next, bool link_trbs, bool isoc)
> > +{
> > +	if (!prev || !next)
> > +		return;
> > +
> > +	xhci_link_segments(xhci, prev->last_seg, next->first_seg,
> > +				link_trbs, isoc);
> > +	xhci_link_segments(xhci, next->last_seg, prev->first_seg,
> > +				link_trbs, isoc);
> > +	prev->num_segs += next->num_segs;
> > +
> > +	if (link_trbs) {
> > +		prev->last_seg->trbs[TRBS_PER_SEGMENT-1].link.control
> > +			&= ~cpu_to_le32(LINK_TOGGLE);
> 
> I don't think you can blindly assume you can link the last segment of
> the current ring with the new ring segments.  What if you've already
> wrapped around the last segment when room_on_ring fails?
> 
> For example, say you have a one-segment ring, and someone queues a 61-TRB TD,
> which completes before they can submit any more TDs.  Then they try to queue
> two 61-TRB TDs in a row (let's call them TD A and B).  For TD A, room_on_ring()
> succeeds, the TD is queued, and the link TRB at the end of the segment is given
> to the hardware (the cycle bit is toggled).  Then room_on_ring() fails for TD B.
> 
> Now the ring will look like this:
>  -------------------
> | TD A, TRB 4  - CH | <-------
>  -------------------         |
> | TD A, TRB 5       |        |
>  -------------------         |
> | ...               |        |
>  -------------------         |
> | TD A, TRB 61      |        |
>  -------------------         |
> |                   |        |
>  -------------------         |
> |                   |        |
>  -------------------         |
> | TD A, TRB 1  - CH |        |
>  -------------------         |
> | TD A, TRB 2  - CH |        |
>  -------------------         |
> | TD A, TRB 3       |        |
>  -------------------         |
> | LINK TRB          | --------
>  -------------------
> 
> At that point, you've given control the link TRB to the hardware by
> toggling the cycle bit.  If you try to overwrite a link TRB that's given
> over to the hardware, one of two things will happen:
> 
> 1. The hardware has already cached the link TRB, and the write will
> never be noticed.  It will just go back to the top of the original
> segment after processing TD A, TRB 3.  But adding that new segment won't help
> you enqueue TD B.
> 
> 2. The hardware hasn't cached the link TRB and it will go fetch TRBs in
> the new segment.  If the new segments have the cycle bit set to software
> owned (which I think you do that in this patch), then hardware will stop
> on the first TRB of the new segment (since it doesn't own that TRB) and
> fail to execute the rest of TD A.  Then the cancellation code will run,
> and goodness knows what it will make of the ring structure and where the
> hardware thinks it is stopped.
> 

Oh, I see your point. Yes, you are right; I failed to realize this
scenario. So we can not expand a ring if "the software has toggled the
cycle while the hardware has not", right? If that's the case, preemptive
ring expansion is necessary.

> So you can't just blindly assume you'll always be expanding the ring
> before you wrap around the link TRB at the end of the segment.  I would
> highly recommend you not pursue the option of adding link TRBs in the
> middle of the segment, as none of the ring walking code, cancellation
> code, or enqueuing code is set up to handle that.  Because adding link
> TRBs in the middle would be such a radical change, I think you really
> need pre-emptive ring expansion before overwriting link TRBs.
> 

When you say "adding link TRBs in the middle of segment", do you mean
adding link TRB not as the last the TRB of a segment, say, the 64th TRB?
I don't think I have done that sort of things. 

> <snip>
> > -	if (!room_on_ring(xhci, ep_ring, num_trbs)) {
> > -		/* FIXME allocate more room */
> > -		xhci_err(xhci, "ERROR no room on ep ring\n");
> > -		return -ENOMEM;
> > -	}
> > +
> > +	while (1) {
> > +		if (room_on_ring(xhci, ep_ring, num_trbs))
> > +			break;
> > +
> > +		if (ep_ring->num_segs >= 256) {
> > +			xhci_err(xhci, "ERROR no room on ring "
> > +						"after ring expansion\n");
> > +			return -ENOMEM;
> > +		}
> 
> Please don't place arbitrary limits on ring expansion like this.  I've
> heard from at least one customer that they'll may need ring segments in the
> thousands, so I've stopped assuming I know what ring size is "too big".
> Just let the kernel run out of memory if drivers attempt to queue too
> many big transfers.
> 

Ah, I'm not aware of that. I thought 256 segments is already big enough
for any transfer...

Thanks,
Andiry

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html