Hello Robert, Sorry for being much too late here, but during recent attemts to debug issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due to missing notifications") I believe I found a couple of issues with commit c1da59dad0eb. At least one of them is serious (potentional GPF_KERNEL allocation in interrupt context). But I don't know if I understand the problem the commit set out to solves, which is why I have no proposed patch here. > @@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb) > } > } > > - desc->rerr = status; > + /* > + * only set a new error if there is no previous error. > + * Errors are only cleared during read/open > + */ > + if (desc->rerr == 0) > + desc->rerr = status; > + > if (length + desc->length > desc->wMaxCommand) { > /* The buffer would overflow */ > set_bit(WDM_OVERFLOW, &desc->flags); > This is minor, but in my opinion this change is flawed. It changes the meaning of "desc->rerr" from the original "last status code" to the new "first error since last userspace read". wdm_read will only return and clear such errors if desc->length == 0 on entry. This is up to the userspace application to decide. I claim that any error should either be reported immediately or not at all. Caching errors forever like this, and then reporting them to userspace at some arbitrary later event, is pointless. The existing code did the correct thing in the absence of proper event queing. The only real alternative would be to completely stop processing device events until userspace has seen the error. But that would just make things fail completely for no good reason. We should probably consider making a queue/list of the read buffers and keep each status code with the associated list element. Then we could report back the error when userspace reads the element that caused it. > @@ -221,9 +230,19 @@ static void wdm_in_callback(struct urb *urb) > } > > skip_error: > + set_bit(WDM_READ, &desc->flags); > wake_up(&desc->wait); > > - set_bit(WDM_READ, &desc->flags); > + if (desc->rerr) { > + /* > + * Since there was an error, userspace may decide to not read > + * any data after poll'ing. > + * We should respond to further attempts from the device to send > + * data, so that we can get unstuck. > + */ > + service_outstanding_interrupt(desc); > + } > + > unlock: > spin_unlock(&desc->iuspin); > } We cannot do this. This is an URB callback, and service_outstanding_interrupt() calls usb_submit_urb(...,GFP_KERNEL). This issue could of course be avoided by simply changing the allocation flags. But looking at the comment here, I am pretty sure that this is the wrong solution to the unanticipated userspace behaviour. If userspace decides not to read() if poll() sets POLLERR, then it *should not* expect the error condition to be cleared. Or am I missing something? In any case: This will not unstick anything, given the previous problem with desc->rerr not being updated unless userspace reads. You will just flush the device queue, but the error condition is still there. So poll() will still set POLLERR. If the comment is correct, this is a permanent deadlock. userspace can avoid tge issue by reading the descriptor on POLLERR. If that is not the way this is expected to work (I am by no means a POSIX expert), then I believe the correct fix must be to change wdm_poll(), either to clear the error condition or to not set POLLERR in this case. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html