Oliver Neukum <oneukum@xxxxxxxx> writes: > On Tue, 2016-05-17 at 21:24 +0200, Bjørn Mork wrote: >> Oliver Neukum <oneukum@xxxxxxxx> writes: >> >> > On Fri, 2016-05-13 at 18:59 +0200, Bjørn Mork wrote: >> >> Bjørn Mork <bjorn@xxxxxxx> writes: >> >> >> >> > The driver enforces a strict one-to-one relationship between the >> >> > received RESPONSE_AVAILABLE notifications and messages read from >> >> > the device. At the same time, it will cancel the interrupt URB >> >> > when there is no client holding the character device open. >> >> >> >> Never mind. Forget it. >> >> >> >> This patch breaks other devices again. The immediate and unconditional >> >> reading make them barf. I guess it can be worked around by delaying the >> >> flushing until at least one notification is received, but I obviously >> >> have to test this theory thoroughly on all devices I have. >> > >> > Hi, >> > >> > I think the best approach would be to keep the interrupt URB always >> > active. I didn't do this to conserve bandwidth, but if it makes devices >> > work, it certainly would be the best option. >> >> Yes, I considered that. But this implies purging the device message >> queue without telling userspace that we did so. At least with the >> current driver design, which is based on a single limited size >> buffer. If the device queues a number of unsolictied messages between >> two userspace requests, then we really want all those unsolicted >> messages delivered to the userspace program on the second request. > > You might argue that if user space wants the data it should open the > device. Maybe. It's a variant of the current situation, where userspace must not close the device while a session is in progress. The issue here is that userspace (and the driver) knows nothing about what kind of messages the device decides to send, or when. So how can userspace know that it wants the data? It can't. It has to keep the device open just in case there is something interesting happening. This is not the kind of semantics I'd like to present to any userspace developer. We present a character device as an abstraction of a hardware device. I believe a reasonable assumption from a userspace developer is that the driver forwards all messages it reads from the hardware to the character device. So either we don't read from hardware when the character device is closed, or we cache everything we read until the character device is open. >> And I do think the original bandwidth (and power) conservative approach >> is worth keeping too. There is no point in waking up these devices >> unless there actually is an interested userspace application. > > They can sleep just fine. I did not imply that runtime PM should > be disabled. Yes, which means that we cancel the URBs.. I haven't been able to reproduce it yet, but I think we might occasionally miss a notification during suspend/resume too. But this is timing sensitive, and device timing sensitive, so it's difficult to trigger on purpose. For now I've ignored it. But I wouldn't be surprised if we end up having to do the same "flush queue" excercise on every resume too. >> FWIW, my initial analysis of the problem with the patch was too quick >> imprecise. The problem is simply the -EPIPE status we inevitably will >> hit when the queue is empty, as I should have anticipated. It will be >> returned to userspace translated to -EIO. I am currently testing a >> version taking care of that, and it seems to behave well so far. I'll >> submit it as soon as I am absoltely sure that it works on all WDM, QMI >> and MBIM devices I have. Might take some time, since I am running out >> of mini-PCIe and m.2 adapters.. > > That looks a bit risky. Firstly, if you get -EPIPE after a notification > it is an error and must be reported as such, so you need an additional > state. Yes, -EPIPE should be reported if it occurs later when polling after a notification. But no additional state is needed. That info is already available. > And what do you do after -EPIPE? Do you clean up the stall > or not? And the fun really starts if you get a notification while > you clean the stall. No cleanup necessary/possible AFAICS: This is endpoint 0. > And are you sure all devices can cope with an unsolicited request? Nope. I am not sure about anything when it comes to USB device firmware ;) Broad testing is definitely necessary. But realistically: How can it possibly fail in other ways than returning 0 data bytes or stalling? Wait... Don't answer that. Yes, I know. Some device will do something completely wild. I'm just not sure that it is worth caring about... The CDC spec isn't exactly clear, but I don't see any restrictions on the use of GetEncapsulatedResponse there. On the contrary. There are several examples in the spec referring to the case where the device has no data. There is nothing identifying this as an error. AFAICS, the spec allows a strictly polling CDC WDM driver, sending periodial GetEncapsulatedResponse requests. You don't need to use the interrupt endpoint if you don't want to. But the set of specs involved are confusing enough to ensure all sorts of firmware assumptions. The GetEncapsulatedResponse request is defined in USBCDC1.2 without any semantics at all. This is fixed in the CDCWMC1.1 spec, which defines the WDM class among other things. It goes into detail in section 7: "The firmware shall interpret GetEncapsulatedResponse as a request to read response bytes. The firmware shall send the next wLength bytes from the response. The firmware shall allow the host to retrieve data using any number of GetEncapsulatedResponse requests. The firmware shall return a zero- length reply if there are no data bytes available. The firmware shall send ResponseAvailable notifications periodically, using any appropriate algorithm, to inform the host that there is data available in the reply buffer. The firmware is allowed to send ResponseAvailable notifications even if there is no data available, but this will obviously reduce overall performance." and also "The function shall not return STALL in response to GetEncapsulatedResponse." Unfortunately, the CDCMBIM spec refers only to the USBCDC1.2 definition with the additional MBIM specific message size restrictions. It does not define its own semantics and it does not refer to the CDCWMC1.1 either. Logically I don't think anyone intended these specs to define GetEncapsulatedResponse inconsistently. But they didn't enforce consistence. Anyone reading just the MBIM spec and it's references will miss the important examples and clarifying comments in CDCWMC1.1. And when it comes to QMI devices... Those are of course only loosely modelled after CDC ECM, and only the Qualcomm gods know what's hidden in there. Could be pretty much anything. They don't seem to care about open specs. Well, whatever. None of this matters. What matters is what's implemented in the devices out there. So testing, testing and testing. A summary of what we do know so far: - Some devices have problems with our current assumption about notifications (although CDCWMC1.1 support that assumption). - Some devices will respond with a stall if they have no data bufferend and receive GetEncapsulatedResponse (although they should not according to CDCWMC1.1). It remains to see if there are any devices which cannot cope with an unexpected GetEncapsulatedResponse. Bjørn -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html