On Mon, Jan 23, 2017 at 03:39:43PM +0000, Bryan O'Donoghue wrote: > On 23/01/17 15:13, Johan Hovold wrote: > > On Mon, Jan 23, 2017 at 02:32:48PM +0000, Bryan O'Donoghue wrote: > >> On 23/01/17 12:04, Johan Hovold wrote: > >> > >>> +static void gb_operation_timeout(unsigned long arg) > >>> +{ > >>> + struct gb_operation *operation = (void *)arg; > >>> + > >>> + if (gb_operation_result_set(operation, -ETIMEDOUT)) { > >>> + /* > >>> + * A stuck request message will be cancelled from the > >>> + * workqueue. > >>> + */ > >>> + queue_work(gb_operation_completion_wq, &operation->work); > >>> + } > >>> +} > >>> + > >> > >> If queue_work fails, you will never wake up the waiter. > > > > How could queue_work fail here? The operation result is always set > > exactly once before being queued so this is fine. > > Perhaps it relates to a bug elsewhere, though I struggle to see how we > could feasibly re-use the work associated with an in-flight gb_message. Yeah, and the check you added is even for a different struct delayed_work, so I really don't think we have anything to worry about here. > Could you trap the result code and BUG_ON()/WARN_ON() for it so that we > can debug this a little bit further ? IIUC you've only seen this with some of your unfinished work during development (and for a different work struct), so I don't think we need to add a WARN_ON yet. > >>> - > >>> - ret = wait_for_completion_interruptible_timeout(&operation->completion, > >>> - timeout_jiffies); > >>> + ret = wait_for_completion_interruptible(&operation->completion); > >> > >> Note, that's a case I explicitly handle (failure to queue the work) in > >> the async set I sent out. > > > > Yep, I noticed that, but that was for queueing your timeout work at > > submission. And you queued unconditionally, so you could potentially > > fail to queue if an operation was submitted twice, even if that would in > > itself be a driver bug. > > Have you run this through loopback without any # of in-flight operations > constrained ? No, not yet as that would require also your changes to the loopback driver (the loopback driver still has its custom timeout implementation after this patch). I tested this using gbsim and synchronous operations as my Ara build setup is currently broken. Perhaps you can rerun the tests you saw the queue_work fail with? Are you using gbsim or real hardware to test with? Thanks, Johan _______________________________________________ greybus-dev mailing list greybus-dev@xxxxxxxxxxxxxxxx https://lists.linaro.org/mailman/listinfo/greybus-dev