Re: Blackberry regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 4 Mar 2009, Pete Zaitcev wrote:

> On Wed, 4 Mar 2009 21:33:24 -0500, Chris Frey <cdfrey@xxxxxxxxxxxxxx> wrote:
> 
> > @@ -40,8 +40,6 @@
> >  C Bi:1:002:3 0 12 = 00000c00 06ff0006 00000000
> >  S Ci:1:002:0 s 80 08 0000 0000 0001 1 <
> >  C Ci:1:002:0 0 1 = 01
> > -S Co:1:002:0 s 00 09 0001 0000 0000 0
> > -C Co:1:002:0 0 0
> > 
> > This is the usb_set_configuration() that's missing in the timeout version.
> > 
> >  S Co:1:002:0 s 02 01 0000 0083 0000 0
> 
> OK
> 
> > @@ -49,46 +47,6 @@
> >  S Bo:1:002:4 -115 24 = 00001800 07ff0007 52494d20 4465736b 746f7000 00000000
> >  C Bo:1:002:4 0 24 >
> >  S Bi:1:002:3 -115 16384 <
> > -C Bi:1:002:3 0 44 = 00002c00 08050007 52494d20 4465736b 746f7000 00000000 00000000 01000400
> > -S Bi:1:002:3 -115 16384 <
> >  C Bi:1:002:3 0 12 = 00000c00 13050100 00000000
> 
> > So, unless I miss my guess, it looks like somehow the device thinks it has
> > already sent the data, and sends the sequence packet when we're expecting
> > data.  Barry then asks for the actual data, and we timeout, because the
> > device has nothing to give.
> 
> OK
> 
> > It looks like a bulk message is being dropped or ignored somewhere
> > along the way.
> > 
> > Thoughts?
> 
> This is not how USB works. Unless I misunderstood you, you think it's
> some kind of overcomplicated Ethernet. But in reality the properties
> of the bus are completely different. You may rest assured in this case
> that the device simply did not send the data.

Not necessarily.  Data _can_ get lost if there's a toggle mismatch 
between the host and the device.

Here's how it works.  Each DATA packet contains a toggle bit, and the 
toggle changes value each time a packet is acknowledged.  When an 
endpoint is initialized or reset, the toggle is cleared.

So for example, consider the first packet the device sends from its
bulk-in endpoint.  The toggle value will of course be 0.  If the device
receives back an ACK from the host then it knows that the host received
the packet, so it will change the toggle to 1 and prepare to send the
next packet.  If it doesn't receive an ACK then it will assume the host
did not receive the data, so it will leave the toggle equal to 0 and
retransmit the same data next time.

Now, it's possible that the host _did_ receive the first packet
correctly but the ACK got lost or corrupted.  Then the next time the
host asks for data from that endpoint, it will get a repeat of the old
data.  The host recognizes this by seeing that the toggle is the old
value 0 instead of the expected new value 1.  The host sends back an
ACK so that the device will know the packet was received, but it throws
the data away and asks for another packet.

Suppose the device's toggle has gotten messed up somehow; let's say it
is 0 when it should be 1.  The device will send new data with its 0
toggle value.  The host will see that the toggle is different from what
it expects and so will assume that this is a retransmission of old
data.  So the host will throw the new data away, and now it is lost for
good.  (The same sort of thing will happen if the host's toggle has
been messed up -- what matters is the mismatch between the host and the 
device.)

The difficult thing about toggle errors is that you can't see them in 
usbmon; the hardware handles everything automatically without telling 
the host OS.  You have to use a USB bus analyzer.

> Note though, the packet loss is possible inside the device. For example,
> consider a device which only has one buffer that can only hold one
> message (remember that a USB device cannot send anything to the host
> on its own accord, and must be polled by the host). If the host does
> not schedule a transfer to fetch it, next message will overwrite
> the previous one.
> 
> It's difficult to imagine Blackberry to be this retarded. It's more
> common on USB devices made around 8-bit PICs. But who knows.

This is possible too.  And it's possible that the Blackberry isn't 
quite this stupid but somehow does manage to mess up a toggle.

Chris said that git -bisect identified commit
24c0996a6b73e2554104961afcc8659534503e0d.  As it turns out, this commit
introduced a bug: Sometimes the host's toggle value would be wrong.  
Two later commits, 73cb49b8860d9336ee4b24ecbc0d2358aff862f7 and
b7055fa7953a23512ea7d4f97cc5ac209e14a64a, were needed to fix the bug.  
I'm pretty confident that it _has_ been fixed -- but I've been known to 
be wrong before...

A complete usbmon starting from before the device was plugged in might
help to settle the issue.  After all, following a fresh plug-in the 
toggle value definitely should be 0 on both the device and the host, no 
matter what.

Calling usb_set_altinterface() at the start should always be safe and 
it should work around the problem, whatever the ultimate cause is.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux