On Tue, Feb 07 2017 at 11:58pm -0500, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > On Tue, Feb 07, 2017 at 09:39:11PM +0100, Pavel Machek wrote: > > On Mon 2017-02-06 17:49:06, Kent Overstreet wrote: > > > On Mon, Feb 06, 2017 at 04:47:24PM -0900, Kent Overstreet wrote: > > > > On Mon, Feb 06, 2017 at 01:53:09PM +0100, Pavel Machek wrote: > > > > > Still there on v4.9, 36 threads on nokia n900 cellphone. > > > > > > > > > > So.. what needs to be done there? > > > > > > > But, I just got an idea for how to handle this that might be halfway sane, maybe > > > > I'll try and come up with a patch... > > > > > > Ok, here's such a patch, only lightly tested: > > > > I guess it would be nice for me to test it... but what it is against? > > I tried after v4.10-rc5 and linux-next, but got rejects in both cases. > > Sorry, I forgot I had a few other patches in my branch that touch > mempool/biosets code. > > Also, after thinking about it more and looking at the relevant code, I'm pretty > sure we don't need rescuer threads for block devices that just split bios - i.e. > most of them, so I changed my patch to do that. > > Tested it by ripping out the current->bio_list checks/workarounds from the > bcache code, appears to work: Feedback on this patch below, but first: There are deeper issues with the current->bio_list and rescue workqueues than thread counts. I cannot help but feel like you (and Jens) are repeatedly ignoring the issue that has been raised numerous times, most recently: https://www.redhat.com/archives/dm-devel/2017-February/msg00059.html FYI, this test (albeit ugly) can be used to check if the dm-snapshot deadlock is fixed: https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html This situation is the unfortunate pathological worst case for what happens when changes are merged and nobody wants to own fixing the unforseen implications/regressions. Like everyone else in a position of Linux maintenance I've tried to stay away from owning the responsibility of a fix -- it isn't working. Ok, I'll stop bitching now.. I do bear responsibility for not digging in myself. We're all busy and this issue is "hard". > -- >8 -- > Subject: [PATCH] block: Make rescuer threads per request_queue, not per bioset > > Also, trigger rescuing whenever with bios on current->bio_list, instead > of only when we block in bio_alloc_bioset(). This is more correct, and > should result in fewer rescuer threads. > > XXX: The current->bio_list plugging needs to be unified with the > blk_plug mechanism. > > Signed-off-by: Kent Overstreet <kent.overstreet@xxxxxxxxx> > --- ... > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 3086da5664..e1b22a68d9 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -1490,7 +1490,7 @@ static struct mapped_device *alloc_dev(int minor) > INIT_LIST_HEAD(&md->table_devices); > spin_lock_init(&md->uevent_lock); > > - md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id); > + md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id, 0); > if (!md->queue) > goto bad; > This should be BLK_QUEUE_NO_RESCUER as DM isn't making direct use of bio_queue_split() for its own internal spliting (maybe it should and that'd start to fix the issue I've been harping about?) but as is DM destroys the rescuer workqueue (since commit dbba42d8a9eb "dm: eliminate unused "bioset" process for each bio-based DM device"). Mike