Re: new patchset to eliminate DM's use of BIOSET_NEED_RESCUER

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Tue, 21 Nov 2017 20:21:54 -0500 (EST)

On Tue, 21 Nov 2017, Mike Snitzer wrote:

> On Tue, Nov 21 2017 at  4:23pm -0500,
> Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
> 
> > This is not correct:
> > 
> >    2206 static void dm_wq_work(struct work_struct *work)
> >    2207 {
> >    2208         struct mapped_device *md = container_of(work, struct mapped_device, work);
> >    2209         struct bio *bio;
> >    2210         int srcu_idx;
> >    2211         struct dm_table *map;
> >    2212
> >    2213         if (!bio_list_empty(&md->rescued)) {
> >    2214                 struct bio_list list;
> >    2215                 spin_lock_irq(&md->deferred_lock);
> >    2216                 list = md->rescued;
> >    2217                 bio_list_init(&md->rescued);
> >    2218                 spin_unlock_irq(&md->deferred_lock);
> >    2219                 while ((bio = bio_list_pop(&list)))
> >    2220                         generic_make_request(bio);
> >    2221         }
> >    2222
> >    2223         map = dm_get_live_table(md, &srcu_idx);
> >    2224
> >    2225         while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
> >    2226                 spin_lock_irq(&md->deferred_lock);
> >    2227                 bio = bio_list_pop(&md->deferred);
> >    2228                 spin_unlock_irq(&md->deferred_lock);
> >    2229
> >    2230                 if (!bio)
> >    2231                         break;
> >    2232
> >    2233                 if (dm_request_based(md))
> >    2234                         generic_make_request(bio);
> >    2235                 else
> >    2236                         __split_and_process_bio(md, map, bio);
> >    2237         }
> >    2238
> >    2239         dm_put_live_table(md, srcu_idx);
> >    2240 }
> > 
> > You can see that if we are in dm_wq_work in __split_and_process_bio, we 
> > will not process md->rescued list.
> 
> Can you elaborate further?  We cannot be "in dm_wq_work in
> __split_and_process_bio" simultaneously.  Do you mean as a side-effect
> of scheduling away from __split_and_process_bio?
> 
> The more detail you can share the better.

Suppose this scenario:

* dm_wq_work calls __split_and_process_bio
* __split_and_process_bio eventually reaches the function snapshot_map
* snapshot_map attempts to take the snapshot lock

* the snapshot lock could be released only if some bios submitted by the 
snapshot driver to the underlying device complete
* the bios submitted to the underlying device were already offloaded by 
some other task and they are waiting on the list md->rescued
* the bios waiting on md->rescued are not processed, because dm_wq_work is 
blocked in snapshot_map (called from __split_and_process_bio)

> > The processing of md->rescued is also wrong - bios for different devices 
> > must be offloaded to different helper threads, so that processing a bio 
> > for a lower device doesn't depend on processing a bio for a higher device. 
> > If you offload all the bios on current->bio_list to the same thread, the 
> > bios still depend on each other and the deadlock will still happen.
> 
> Commit 325738403 ("dm: revise 'rescue' strategy for bio-based bioset
> allocations") speaks to this with:
> 
> "Note that only current->bio_list[0] is offloaded.  current->bio_list[1]
> contains bios that were scheduled *before* the current one started, so
> they must have been submitted from higher up the stack, and we cannot be
> waiting for them here (thanks to the "dm: ensure bio submission follows
> a depth-first tree walk" commit).  Also, we now rescue *all* bios on the
> list as there is nothing to be gained by being more selective."

I think you are right - if we only offload current->bio_list[0], then 
mixing of dependent bios on the offloaded list won't happen.

> And again: this patchset passes your dm-snapshot deadlock test.  Is
> that test somehow lacking?

With your patchset, the deadlock would happen only if bios are queued on 
&md->deferred - and that happens only in case of resume or if we are 
processing REQ_PREFLUSH with non-zero data size.

So, the simple test that I wrote doesn't trigger it, but a more complex 
test involving REQ_PREFLUSH could.

> Or do you see a hypothetical case where a deadlock is still possible?
> That is of less concern.  I'd prefer that we tackle problems for
> targets, and associated scenarios, that we currently support.
> 
> Either way, happy to review this with you further.  Any fixes are
> welcomed too.  But I'd like us to head in a direction that this patchset
> is taking us.  Specifically: away from DM relying on BIOSET_NEED_RESCUER.
> 
> Thanks,
> Mike

Mikulas