Hi, On Thu, Sep 25, 2014 at 05:50:59PM +0000, Paul Zimmerman wrote: > > From: Felipe Balbi [mailto:balbi@xxxxxx] > > Sent: Wednesday, September 24, 2014 7:41 PM > > > > On Thu, Sep 25, 2014 at 01:37:05AM +0000, Paul Zimmerman wrote: > > > > > > Then, after the host cleared the halt, the gadget apparently sent the > > > > > > data that _should_ have been sent previously. The host was expecting > > > > > > to receive the CSW at this point, so there was an overflow error. > > > > > > That's what caused the host to perform a reset. > > > > > > > > > > > > Evidently this UDC implements the set_halt method incorrectly. > > > > > > According to the kerneldoc for usb_ep_set_halt: > > > > > > > > > > > > * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any > > > > > > * transfer requests are still queued, or if the controller hardware > > > > > > * (usually a FIFO) still holds bytes that the host hasn't collected. > > > > > > > > > > damn old bugs :-) I'll fix that up and Cc stable. > > > > > > > > alright fixed. Below you can see a combined diff which I'll still split > > > > into patches so they can be applied properly. > > > > > > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ] > > > > > > Ick. That shouldn't be necessary. The Synopsys vendor driver works > > > > you need to make sure that both there are no pending transfers and FIFO > > is empty in case of IN endpoints. Databook itself says (sorry, forgot > > the section and no access to the docs right now) that to figure out if > > the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any > > better ideas ? > > I guess I'm just afraid this may just be papering over the real cause. Well, the API documentation, as pointed out by Alan, calls for: 380 * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any 381 * transfer requests are still queued, or if the controller hardware 382 * (usually a FIFO) still holds bytes that the host hasn't collected. There's no other way to ensuring FIFO is empty with this core. > > > fine with the mass-storage gadget (with stall=1) without doing anything > > > like that. > > > > > > When did the dwc3 driver start showing this problem? Wouldn't it be > > > better to find the change which caused this? Or has dwc3 always had > > > > it's probably been like that since ever :-) I just figured that my > > scripts always had stall=0 and ended up never testing the other way. > > > > > this problem, but you never tested with stall=1 before so didn't see > > > it? > > > > yup. Note that it's not enough to checked for TRB completion because > > there could still be data in the FIFO which the host hasn't moved yet, > > unless it's 100% guaranteed that the core won't fire XferComplete until > > every single bit of data has been moved by the host. > > That is supposed to be 100% guaranteed. The IN XferInProgress and > XferComplete events are only delivered after the host has ACKed the > packet. > > > But the way things are, I'd expect core to be firing transfer complete > > as soon as all data has reached the FIFO (in case of TX). > > Nope. > > That's why I don't understand how this can happen for IN. AFAIK, a STALL > is only sent in response to something the host sent in the CBW. At that > point, there shouldn't be any IN transfers active. you could race with the HW, right ? You just issued Start Transfer for that IN transfer and control returns to the gadget driver which can, at that very moment, issue a Set Stall. SW *must* make sure that there's nothing pending yet. We haven't received XferComplete for that IN we just started. > BTW, about a year ago, I fixed a similar issue in our vendor driver. > But I don't remember what the cause was, I will search the Git log to > see if I can find the commit that fixed it. ok, thanks. -- balbi
Attachment:
signature.asc
Description: Digital signature