On Tue, 21 Nov 2017, Mike Snitzer wrote: > On Tue, Nov 21 2017 at 7:43am -0500, > Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > > > Decided it a better use of my time to review and then hopefully use the > > block-core's bio splitting infrastructure in DM. Been meaning to do > > that for quite a while anyway. This mail from you just made it all the > > more clear that needs doing: > > https://www.redhat.com/archives/dm-devel/2017-September/msg00098.html > > > > So I will start here on this patch you proposed: > > https://www.redhat.com/archives/dm-devel/2017-September/msg00091.html > > (of note, this patch slipped through the cracks because I was recovering > > from injury when it originally came through). > > > > Once DM is using q->bio_split I'll come back to this patch (aka > > "[1]") as a starting point for the follow-on work to remove DM's use of > > BIOSET_NEED_RESCUER: > > https://www.redhat.com/archives/dm-devel/2017-August/msg00315.html > > Hey Neil, > > Good news! All your code works ;) > > (well after 1 fixup due to a cut-n-paste bug.. the code you added to > dm_wq_work() to process the md->rescued bio_list was operating on > the md->deferred bio_list due to cut-n-paste from code you copied from > just below it) > > I split your code out some to make it more reviewable. I also tweaked > headers accordingly. > > Please see this branch (which _will_ get rebased between now and the > 4.16 merge window): > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.16 > > I successfully tested these changes using Mikulas' test program that > reproduces the snapshot deadlock: > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html > > I'll throw various other DM testsuites at it to verify they all look > good (e.g. thinp, cache, multipath). > > I'm open to all suggestions about changes you'd like to see (either to > these patches or anything you'd like to layer ontop of them). > > Thanks for all your work, much appreciated! > Mike This is not correct: 2206 static void dm_wq_work(struct work_struct *work) 2207 { 2208 struct mapped_device *md = container_of(work, struct mapped_device, work); 2209 struct bio *bio; 2210 int srcu_idx; 2211 struct dm_table *map; 2212 2213 if (!bio_list_empty(&md->rescued)) { 2214 struct bio_list list; 2215 spin_lock_irq(&md->deferred_lock); 2216 list = md->rescued; 2217 bio_list_init(&md->rescued); 2218 spin_unlock_irq(&md->deferred_lock); 2219 while ((bio = bio_list_pop(&list))) 2220 generic_make_request(bio); 2221 } 2222 2223 map = dm_get_live_table(md, &srcu_idx); 2224 2225 while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) { 2226 spin_lock_irq(&md->deferred_lock); 2227 bio = bio_list_pop(&md->deferred); 2228 spin_unlock_irq(&md->deferred_lock); 2229 2230 if (!bio) 2231 break; 2232 2233 if (dm_request_based(md)) 2234 generic_make_request(bio); 2235 else 2236 __split_and_process_bio(md, map, bio); 2237 } 2238 2239 dm_put_live_table(md, srcu_idx); 2240 } You can see that if we are in dm_wq_work in __split_and_process_bio, we will not process md->rescued list. The processing of md->rescued is also wrong - bios for different devices must be offloaded to different helper threads, so that processing a bio for a lower device doesn't depend on processing a bio for a higher device. If you offload all the bios on current->bio_list to the same thread, the bios still depend on each other and the deadlock will still happen. Mikulas