On Fri, Jun 24 2016 at 10:27am -0400, Lars Ellenberg <lars.ellenberg@xxxxxxxxxx> wrote: > On Fri, Jun 24, 2016 at 07:36:57PM +0800, Ming Lei wrote: > > > > > > This is not a theoretical problem. > > > At least int DRBD, and an unfortunately high IO concurrency wrt. the > > > "max-buffers" setting, without this patch we have a reproducible deadlock. > > > > Is there any log about the deadlock? And is there any lockdep warning > > if it is enabled? > > In DRBD, to avoid potentially very long internal queues as we wait for > our replication peer device and local backend, we limit the number of > in-flight bios we accept, and block in our ->make_request_fn() if that > number exceeds a configured watermark ("max-buffers"). > > Works fine, as long as we could assume that once our make_request_fn() > returns, any bios we "recursively" submitted against the local backend > would be dispatched. Which used to be the case. It'd be useful to know whether this patch fixes your issue: https://patchwork.kernel.org/patch/7398411/ Ming Lei didn't like it due to concerns about I contexts changing (whereby breaking merging that occurs via plugging). But if it _does_ fix your issue then the case for the change is increased; and we just need to focus on addressing Ming's concerns (Mikulas has some ideas). Conversely, and in parallel, Mikulas can look to see if your approach fixes the observed dm-snapshot deadlock that he set out to fix. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html