issues with commit c1da59dad0eb ("cdc-wdm: Clear read pipeline in case of error")

Bjørn Mork <bjorn@xxxxxxx> · Thu, 20 Apr 2017 10:32:10 +0200

Hello Robert,

Sorry for being much too late here, but during recent attemts to debug
issues caused by my commit 833415a3e781 ("cdc-wdm: fix "out-of-sync" due
to missing notifications") I believe I found a couple of issues with
commit c1da59dad0eb. At least one of them is serious (potentional
GPF_KERNEL allocation in interrupt context).

But I don't know if I understand the problem the commit set out to
solves, which is why I have no proposed patch here.

> @@ -189,7 +192,13 @@ static void wdm_in_callback(struct urb *urb)
>  		}
>  	}
>  
> -	desc->rerr = status;
> +	/*
> +	 * only set a new error if there is no previous error.
> +	 * Errors are only cleared during read/open
> +	 */
> +	if (desc->rerr  == 0)
> +		desc->rerr = status;
> +
>  	if (length + desc->length > desc->wMaxCommand) {
>  		/* The buffer would overflow */
>  		set_bit(WDM_OVERFLOW, &desc->flags);
> 

This is minor, but in my opinion this change is flawed.  It changes the
meaning of "desc->rerr" from the original "last status code" to the new
"first error since last userspace read".

wdm_read will only return and clear such errors if desc->length == 0 on
entry.  This is up to the userspace application to decide. I claim that
any error should either be reported immediately or not at all.  Caching
errors forever like this, and then reporting them to userspace at some
arbitrary later event, is pointless. 

The existing code did the correct thing in the absence of proper event
queing.  The only real alternative would be to completely stop
processing device events until userspace has seen the error.  But that
would just make things fail completely for no good reason.

We should probably consider making a queue/list of the read buffers and
keep each status code with the associated list element.  Then we could
report back the error when userspace reads the element that caused it.

> @@ -221,9 +230,19 @@ static void wdm_in_callback(struct urb *urb)
>  	}
>  
>  skip_error:
> +	set_bit(WDM_READ, &desc->flags);
>  	wake_up(&desc->wait);
>  
> -	set_bit(WDM_READ, &desc->flags);
> +	if (desc->rerr) {
> +		/*
> +		 * Since there was an error, userspace may decide to not read
> +		 * any data after poll'ing.
> +		 * We should respond to further attempts from the device to send
> +		 * data, so that we can get unstuck.
> +		 */
> +		service_outstanding_interrupt(desc);
> +	}
> +
>  unlock:
>  	spin_unlock(&desc->iuspin);
>   }

We cannot do this.  This is an URB callback, and service_outstanding_interrupt()
calls  usb_submit_urb(...,GFP_KERNEL).

This issue could of course be avoided by simply changing the allocation
flags. But looking at the comment here, I am pretty sure that this is
the wrong solution to the unanticipated userspace behaviour.  If
userspace decides not to read() if poll() sets POLLERR, then it *should
not* expect the error condition to be cleared. Or am I missing something?

In any case: This will not unstick anything, given the previous problem
with desc->rerr not being updated unless userspace reads.  You will just
flush the device queue, but the error condition is still there. So
poll() will still set POLLERR.  If the comment is correct, this is a
permanent deadlock.

userspace can avoid tge issue by reading the descriptor on POLLERR. If
that is not the way this is expected to work (I am by no means a POSIX
expert), then I believe the correct fix must be to change wdm_poll(),
either to clear the error condition or to not set POLLERR in this case.

Bjørn
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html