RE: g_mass_storage bug ?

Paul Zimmerman <Paul.Zimmerman@xxxxxxxxxxxx> · Thu, 25 Sep 2014 17:50:59 +0000

> From: Felipe Balbi [mailto:balbi@xxxxxx]
> Sent: Wednesday, September 24, 2014 7:41 PM
> 
> On Thu, Sep 25, 2014 at 01:37:05AM +0000, Paul Zimmerman wrote:
> > > > > Then, after the host cleared the halt, the gadget apparently sent the
> > > > > data that _should_ have been sent previously.  The host was expecting
> > > > > to receive the CSW at this point, so there was an overflow error.
> > > > > That's what caused the host to perform a reset.
> > > > >
> > > > > Evidently this UDC implements the set_halt method incorrectly.
> > > > > According to the kerneldoc for usb_ep_set_halt:
> > > > >
> > > > >  * Attempts to halt IN endpoints will fail (returning -EAGAIN) if any
> > > > >  * transfer requests are still queued, or if the controller hardware
> > > > >  * (usually a FIFO) still holds bytes that the host hasn't collected.
> > > >
> > > > damn old bugs :-) I'll fix that up and Cc stable.
> > >
> > > alright fixed. Below you can see a combined diff which I'll still split
> > > into patches so they can be applied properly.
> >
> > [ snipped patch which grubs around in the GDBGFIFOSPACE registers ]
> >
> > Ick. That shouldn't be necessary. The Synopsys vendor driver works
> 
> you need to make sure that both there are no pending transfers and FIFO
> is empty in case of IN endpoints. Databook itself says (sorry, forgot
> the section and no access to the docs right now) that to figure out if
> the FIFO is empty, application can use GDBGFIFOSPACE. Do you have any
> better ideas ?

I guess I'm just afraid this may just be papering over the real cause.

> > fine with the mass-storage gadget (with stall=1) without doing anything
> > like that.
> >
> > When did the dwc3 driver start showing this problem? Wouldn't it be
> > better to find the change which caused this? Or has dwc3 always had
> 
> it's probably been like that since ever :-) I just figured that my
> scripts always had stall=0 and ended up never testing the other way.
> 
> > this problem, but you never tested with stall=1 before so didn't see
> > it?
> 
> yup. Note that it's not enough to checked for TRB completion because
> there could still be data in the FIFO which the host hasn't moved yet,
> unless it's 100% guaranteed that the core won't fire XferComplete until
> every single bit of data has been moved by the host.

That is supposed to be 100% guaranteed. The IN XferInProgress and
XferComplete events are only delivered after the host has ACKed the
packet.

> But the way things are, I'd expect core to be firing transfer complete
> as soon as all data has reached the FIFO (in case of TX).

Nope.

That's why I don't understand how this can happen for IN. AFAIK, a STALL
is only sent in response to something the host sent in the CBW. At that
point, there shouldn't be any IN transfers active.

BTW, about a year ago, I fixed a similar issue in our vendor driver.
But I don't remember what the cause was, I will search the Git log to
see if I can find the commit that fixed it.

-- 
Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html