On Wed, 4 Mar 2009, Pete Zaitcev wrote: > On Wed, 4 Mar 2009 21:33:24 -0500, Chris Frey <cdfrey@xxxxxxxxxxxxxx> wrote: > > > @@ -40,8 +40,6 @@ > > C Bi:1:002:3 0 12 = 00000c00 06ff0006 00000000 > > S Ci:1:002:0 s 80 08 0000 0000 0001 1 < > > C Ci:1:002:0 0 1 = 01 > > -S Co:1:002:0 s 00 09 0001 0000 0000 0 > > -C Co:1:002:0 0 0 > > > > This is the usb_set_configuration() that's missing in the timeout version. > > > > S Co:1:002:0 s 02 01 0000 0083 0000 0 > > OK > > > @@ -49,46 +47,6 @@ > > S Bo:1:002:4 -115 24 = 00001800 07ff0007 52494d20 4465736b 746f7000 00000000 > > C Bo:1:002:4 0 24 > > > S Bi:1:002:3 -115 16384 < > > -C Bi:1:002:3 0 44 = 00002c00 08050007 52494d20 4465736b 746f7000 00000000 00000000 01000400 > > -S Bi:1:002:3 -115 16384 < > > C Bi:1:002:3 0 12 = 00000c00 13050100 00000000 > > > So, unless I miss my guess, it looks like somehow the device thinks it has > > already sent the data, and sends the sequence packet when we're expecting > > data. Barry then asks for the actual data, and we timeout, because the > > device has nothing to give. > > OK > > > It looks like a bulk message is being dropped or ignored somewhere > > along the way. > > > > Thoughts? > > This is not how USB works. Unless I misunderstood you, you think it's > some kind of overcomplicated Ethernet. But in reality the properties > of the bus are completely different. You may rest assured in this case > that the device simply did not send the data. Not necessarily. Data _can_ get lost if there's a toggle mismatch between the host and the device. Here's how it works. Each DATA packet contains a toggle bit, and the toggle changes value each time a packet is acknowledged. When an endpoint is initialized or reset, the toggle is cleared. So for example, consider the first packet the device sends from its bulk-in endpoint. The toggle value will of course be 0. If the device receives back an ACK from the host then it knows that the host received the packet, so it will change the toggle to 1 and prepare to send the next packet. If it doesn't receive an ACK then it will assume the host did not receive the data, so it will leave the toggle equal to 0 and retransmit the same data next time. Now, it's possible that the host _did_ receive the first packet correctly but the ACK got lost or corrupted. Then the next time the host asks for data from that endpoint, it will get a repeat of the old data. The host recognizes this by seeing that the toggle is the old value 0 instead of the expected new value 1. The host sends back an ACK so that the device will know the packet was received, but it throws the data away and asks for another packet. Suppose the device's toggle has gotten messed up somehow; let's say it is 0 when it should be 1. The device will send new data with its 0 toggle value. The host will see that the toggle is different from what it expects and so will assume that this is a retransmission of old data. So the host will throw the new data away, and now it is lost for good. (The same sort of thing will happen if the host's toggle has been messed up -- what matters is the mismatch between the host and the device.) The difficult thing about toggle errors is that you can't see them in usbmon; the hardware handles everything automatically without telling the host OS. You have to use a USB bus analyzer. > Note though, the packet loss is possible inside the device. For example, > consider a device which only has one buffer that can only hold one > message (remember that a USB device cannot send anything to the host > on its own accord, and must be polled by the host). If the host does > not schedule a transfer to fetch it, next message will overwrite > the previous one. > > It's difficult to imagine Blackberry to be this retarded. It's more > common on USB devices made around 8-bit PICs. But who knows. This is possible too. And it's possible that the Blackberry isn't quite this stupid but somehow does manage to mess up a toggle. Chris said that git -bisect identified commit 24c0996a6b73e2554104961afcc8659534503e0d. As it turns out, this commit introduced a bug: Sometimes the host's toggle value would be wrong. Two later commits, 73cb49b8860d9336ee4b24ecbc0d2358aff862f7 and b7055fa7953a23512ea7d4f97cc5ac209e14a64a, were needed to fix the bug. I'm pretty confident that it _has_ been fixed -- but I've been known to be wrong before... A complete usbmon starting from before the device was plugged in might help to settle the issue. After all, following a fresh plug-in the toggle value definitely should be 0 on both the device and the host, no matter what. Calling usb_set_altinterface() at the start should always be safe and it should work around the problem, whatever the ultimate cause is. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html