Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"

Tejun Heo <tj@xxxxxxxxxx> · Tue, 15 Jan 2013 15:50:43 -0800

cc'ing Arjan.  Arjan, the original thread can be read from

  http://thread.gmane.org/gmane.linux.kernel/1420814

Hello, again.

On Tue, Jan 15, 2013 at 12:18:01PM -0800, Linus Torvalds wrote:
> I think that is a good solution if it works, but look out: we need to
> synchronize across *all* domains, not just the default one.  The sd.c
> code, for example, uses its own "scsi_sd_probe_domain" for example,
> and we *do* want to synchronize with it.
> 
> Can you do that with your suggested interface (ie it would have to be
> a *global* sequence number).

So, I've been thinking about it for a while now and it looks like
async is cutting too many corners to implement any sane stackable
flushing scheme on top.  There simply isn't much information to
determine who should wait for what.

I've thought of two workarounds.  Both suck.

A. Try to detect deadlock conditions from synchronize().  If deadlock
   condition involving other async jobs are detected, whine about it
   and then skip.  Ignore deadlock condition on self (should solve
   this particular case).

   Detecting deadlock condition isn't difficult if there are only
   global synchronizations; unfortunately, fragmented dependencies via
   domain-local synchronization makes this non-trivial.

   We can still do ignore-self thing mostly trivially tho.  This will
   at least work around the problem at hand.

B. The ranged synchronization I first suggested.  The problem with
   this is that it's a common practice for a given async job to try to
   flush anything which comes before it.  This can introduce spurious
   synchronization dependencies which can then lead to deadlocks.

   These conditions can be detected and ignored, at least only
   considering global synchronizations.  The problem here is that
   those deadlock conditions will occur under normal usage and thus
   should be ignored silently, which basically makes synchronization
   silently ignore and finish successfully even if there are
   legitimate deadlocks which should be investigated.

For now, I'm gonna implement simple "I'm not gonna wait for myself"
self-deadlock avoidance.  If this needs any more sophistication, I
think we better reimplement it so that we can explicitly match up and
track who's gonna wait for what instead of throwing everything into a
single cookie space and then try to work back from there.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html