Re: new patchset to eliminate DM's use of BIOSET_NEED_RESCUER

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 21 2017 at 11:00pm -0500,
NeilBrown <neilb@xxxxxxxx> wrote:

> On Tue, Nov 21 2017, Mikulas Patocka wrote:
> 
> > On Tue, 21 Nov 2017, Mike Snitzer wrote:
> >
> >> On Tue, Nov 21 2017 at  4:23pm -0500,
> >> Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
> >> 
> >> > This is not correct:
> >> > 
> >> >    2206 static void dm_wq_work(struct work_struct *work)
> >> >    2207 {
> >> >    2208         struct mapped_device *md = container_of(work, struct mapped_device, work);
> >> >    2209         struct bio *bio;
> >> >    2210         int srcu_idx;
> >> >    2211         struct dm_table *map;
> >> >    2212
> >> >    2213         if (!bio_list_empty(&md->rescued)) {
> >> >    2214                 struct bio_list list;
> >> >    2215                 spin_lock_irq(&md->deferred_lock);
> >> >    2216                 list = md->rescued;
> >> >    2217                 bio_list_init(&md->rescued);
> >> >    2218                 spin_unlock_irq(&md->deferred_lock);
> >> >    2219                 while ((bio = bio_list_pop(&list)))
> >> >    2220                         generic_make_request(bio);
> >> >    2221         }
> >> >    2222
> >> >    2223         map = dm_get_live_table(md, &srcu_idx);
> >> >    2224
> >> >    2225         while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
> >> >    2226                 spin_lock_irq(&md->deferred_lock);
> >> >    2227                 bio = bio_list_pop(&md->deferred);
> >> >    2228                 spin_unlock_irq(&md->deferred_lock);
> >> >    2229
> >> >    2230                 if (!bio)
> >> >    2231                         break;
> >> >    2232
> >> >    2233                 if (dm_request_based(md))
> >> >    2234                         generic_make_request(bio);
> >> >    2235                 else
> >> >    2236                         __split_and_process_bio(md, map, bio);
> >> >    2237         }
> >> >    2238
> >> >    2239         dm_put_live_table(md, srcu_idx);
> >> >    2240 }
> >> > 
> >> > You can see that if we are in dm_wq_work in __split_and_process_bio, we 
> >> > will not process md->rescued list.
> >> 
> >> Can you elaborate further?  We cannot be "in dm_wq_work in
> >> __split_and_process_bio" simultaneously.  Do you mean as a side-effect
> >> of scheduling away from __split_and_process_bio?
> >> 
> >> The more detail you can share the better.
> >
> > Suppose this scenario:
> >
> > * dm_wq_work calls __split_and_process_bio
> > * __split_and_process_bio eventually reaches the function snapshot_map
> > * snapshot_map attempts to take the snapshot lock
> >
> > * the snapshot lock could be released only if some bios submitted by the 
> > snapshot driver to the underlying device complete
> > * the bios submitted to the underlying device were already offloaded by 
> > some other task and they are waiting on the list md->rescued
> > * the bios waiting on md->rescued are not processed, because dm_wq_work is 
> > blocked in snapshot_map (called from __split_and_process_bio)
> 
> Yes, I think you are right. 
> 
> I think the solution is to get rid of the dm_offload() infrastructure
> and make it not necessary.
> i.e. discard my patches
>     dm: prepare to discontinue use of BIOSET_NEED_RESCUER
> and
>     dm: revise 'rescue' strategy for bio-based bioset allocations
> 
> And build on "dm: ensure bio submission follows a depth-first tree walk"
> which was written after those and already makes dm_offload() less
> important.
> 
> Since that "depth-first" patch, every request to the dm device, after
> the initial splitting, allocates just one dm_target_io structure, and
> makes just one __map_bio() call, and so will behave exactly the way
> generic_make_request() expects and copes with - thus avoiding awkward
> dependencies and deadlocks.  Except....

Yes, FYI I've also verified that even with just the "depth-first" patch
(and dm_offload disabled) the snapshot deadlock is fixed.
 
> a/ If any target defines ->num_write_bios() to return >1,
>    __clone_and_map_data_bio() will make multiple calls to alloc_tio()
>    and __map_bio(), which might need rescuing.
>    But no target defines num_write_bios, and none have since it was
>    removed from dm-cache 4.5 years ago.
>    Can we discard num_write_bios??

Yes.

> b/ If any target sets any of num_{flush,discard,write_same,write_zeroes}_bios
>    to a value > 1, then __send_duplicate_bios() will also make multiple
>    calls to alloc_tio() and __map_bio().
>    Some do.
>      dm-cache-target:  flush=2
>      dm-snap: flush=2
>      dm-stripe: discard, write_same, write_zeroes all set to 'stripes'.
> 
> These will only be a problem if the second (or subsequent) alloc_tio()
> blocks waiting for an earlier allocation to complete.  This will only
> be a problem if multiple threads are each trying to allocate multiple
> dm_target_io from the same bioset at the same time.
> This is rare and should be easier to address than the current
> dm_offload() approach. 
> One possibility would be to copy the approach taken by
> crypt_alloc_buffer() which needs to allocate multiple entries from a
> mempool.
> It first tries the with GFP_NOWAIT.  If that fails it take a mutex and
> tries with GFP_NOIO.  This mean only one thread will try to allocate
> multiple bios at once, so there can be no deadlock.
> 
> Below are two RFC patches.  The first removes num_write_bios.
> The second is incomplete and makes a stab are allocating multiple bios
> at once safely.
> A third would be needed to remove dm_offload() etc... but I cannot quite
> fit that in today - must be off.

Great.

> From: NeilBrown <neilb@xxxxxxxx>
> Date: Wed, 22 Nov 2017 14:25:18 +1100
> Subject: [PATCH] DM: remove num_write_bios target interface.
> 
> No target provides num_write_bios and none has done
> since 2013.
> Having the possibility of num_write_bios > 1 complicates
> bio allocation.
> So remove the interface and assume there is only one bio
> needed.
> If a target ever needs more, it must provide a suitable
> bioset and allocate itself based on its particular needs.
> 
> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
> ---
>  drivers/md/dm.c               | 22 ++++------------------
>  include/linux/device-mapper.h | 15 ---------------
>  2 files changed, 4 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index b20febd6cbc7..8c1a05609eea 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1323,27 +1323,13 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
>  {
>  	struct bio *bio = ci->bio;
>  	struct dm_target_io *tio;
> -	unsigned target_bio_nr;
> -	unsigned num_target_bios = 1;
>  	int r = 0;
>  
> -	/*
> -	 * Does the target want to receive duplicate copies of the bio?
> -	 */
> -	if (bio_data_dir(bio) == WRITE && ti->num_write_bios)
> -		num_target_bios = ti->num_write_bios(ti, bio);
> -
> -	for (target_bio_nr = 0; target_bio_nr < num_target_bios; target_bio_nr++) {
> -		tio = alloc_tio(ci, ti, target_bio_nr);
> -		tio->len_ptr = len;
> -		r = clone_bio(tio, bio, sector, *len);
> -		if (r < 0) {
> -			free_tio(tio);
> -			break;
> -		}
> +	tio = alloc_tio(ci, ti, 0);
> +	tio->len_ptr = len;
> +	r = clone_bio(tio, bio, sector, *len);
> +	if (r >= 0)
>  		__map_bio(tio);
> -	}
> -

This bit is wrong, free_tio() is needed if clone_bio() fails.  I can fix
it up though.

I'll work through your patches tomorrow.

Thanks,
Mike



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux