Re: [PATCH v2 5/9] dm: improve noclone bio support

Mike Snitzer <snitzer@xxxxxxxxxx> · Fri, 22 Feb 2019 11:56:54 -0500

On Fri, Feb 22 2019 at 10:22am -0500,
Mike Snitzer <snitzer@xxxxxxxxxx> wrote:

> On Fri, Feb 22 2019 at  5:59am -0500,
> Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
> 
> > Hi
> > 
> > I've fonud out that this patch causes timeout in the lvm test 
> > shell/activate-missing-segment.sh and some others.
> 
> OK, I can reproduce with:
> make check_local T=shell/activate-missing-segment.sh
> 
> Interestingly, it isn't hung in kernel, test can be interrupted.
> And it is hanging at shell/activate-missing-segment.sh's : aux disable_dev "$dev1"
> 
> Other tests with "aux disable_dev" hang too
> (e.g. shell/lvcreate-repair.sh, shell/vgreduce-usage.sh)
> 
> continued below:
> 
> > On Wed, 20 Feb 2019, Mike Snitzer wrote:
> > 
> > > Leverage fact that dm_queue_split() enables use of noclone support even
> > > for targets that require additional splitting (via ti->max_io_len).
> > > 
> > > To achieve this move noclone processing from dm_make_request() to
> > > dm_process_bio() and check if bio->bi_end_io is already set to
> > > noclone_endio.  This also fixes dm_wq_work() to be able to support
> > > noclone bios (because it calls dm_process_bio()).
> > > 
> > > While at it, clean up ->map() return processing to be comparable to
> > > __map_bio() and add some code comments that explain why certain
> > > conditionals exist to prevent the use of noclone.
> > > 
> > > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
> > > ---
> > >  drivers/md/dm.c | 103 ++++++++++++++++++++++++++++++++++----------------------
> > >  1 file changed, 62 insertions(+), 41 deletions(-)
> > > 
> > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > > index 57919f211acc..d84735be6927 100644
> > > --- a/drivers/md/dm.c
> > > +++ b/drivers/md/dm.c
> > > @@ -1764,14 +1764,75 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
> > >  	 * If in ->make_request_fn we need to use blk_queue_split(), otherwise
> > >  	 * queue_limits for abnormal requests (e.g. discard, writesame, etc)
> > >  	 * won't be imposed.
> > > +	 *
> > > +	 * For normal READ and WRITE requests we benefit from upfront splitting
> > > +	 * via dm_queue_split() because we can then leverage noclone support below.
> > >  	 */
> > > -	if (current->bio_list) {
> > > +	if (current->bio_list && (bio->bi_end_io != noclone_endio)) {
> > > +		/*
> > > +		 * It is fine to use {blk,dm}_queue_split() before opting to use noclone
> > > +		 * because any ->bi_private and ->bi_end_io will get saved off below.
> > > +		 * Once noclone_endio() is called the bio_chain's endio will kick in.
> > > +		 * - __split_and_process_bio() can split further, but any targets that
> > > +		 *   require late _DM_ initiated splitting (e.g. dm_accept_partial_bio()
> > > +		 *   callers) shouldn't set ti->no_clone.
> > > +		 */
> > >  		if (is_abnormal_io(bio))
> > >  			blk_queue_split(md->queue, &bio);
> > >  		else
> > >  			dm_queue_split(md, ti, &bio);
> > >  	}
> > >  
> > > +	if (dm_table_supports_noclone(map) &&
> > > +	    (bio_op(bio) == REQ_OP_READ || bio_op(bio) == REQ_OP_WRITE) &&
> > > +	    likely(!(bio->bi_opf & REQ_PREFLUSH)) &&
> > > +	    likely(!bio_integrity(bio)) && /* integrity requires specialized processing */
> > > +	    likely(!dm_stats_used(&md->stats))) { /* noclone doesn't support dm-stats */
> > > +		int r;
> > > +		struct dm_noclone *noclone;
> > > +
> > > +		/* Already been here if bio->bi_end_io is noclone_endio, reentered via dm_wq_work() */
> > > +		if (bio->bi_end_io != noclone_endio) {
> 
> This conditional is causing it ^
> 
> Very strange...

Added some debugging:

Feb 22 10:46:28 thegoat kernel: device-mapper: core: dm_process_bio: 253:2: bio=00000000c70f3c09 bio->bi_iter.bi_sector=0 len=256 bio->bi_end_io != noclone_endio
Feb 22 10:46:28 thegoat kernel: device-mapper: core: dm_process_bio: 253:0: bio=00000000c70f3c09 bio->bi_iter.bi_sector=2048 len=256 bio->bi_end_io = noclone_endio
...
Feb 22 10:46:28 thegoat kernel: device-mapper: core: dm_process_bio: 253:2: bio=00000000da9cadf3 bio->bi_iter.bi_sector=0 len=256 bio->bi_end_io != noclone_endio
Feb 22 10:46:28 thegoat kernel: device-mapper: core: dm_process_bio: 253:0: bio=00000000da9cadf3 bio->bi_iter.bi_sector=2048 len=256 bio->bi_end_io = noclone_endio

So the same noclone bio is entering devices in the stacked DM device
(perfectly normal).

This fixes the issue:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a9856f1986df..8fbe4d5ec8e4 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1831,15 +1831,16 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
            likely(!bio_integrity(bio)) && /* integrity requires specialized processing */
            likely(!dm_stats_used(&md->stats))) { /* noclone doesn't support dm-stats */
                int r;
-               struct dm_noclone *noclone;
-
-               /* Already been here if bio->bi_end_io is noclone_endio, reentered via dm_wq_work() */
-               if (bio->bi_end_io != noclone_endio) {
+               /*
+                * Only allocate noclone if in ->make_request_fn, otherwise
+                * leak could occur due to reentering (e.g. from dm_wq_work)
+                */
+               if (current->bio_list) {
+                       struct dm_noclone *noclone;
                        noclone = kmalloc_node(sizeof(*noclone) + ti->per_io_data_size + sizeof(unsigned),
                                               GFP_NOWAIT, md->numa_node_id);
                        if (unlikely(!noclone))
                                goto no_fast_path;
-
                        noclone->md = md;
                        noclone->start_time = jiffies;
                        noclone->orig_bi_end_io = bio->bi_end_io;

That said, I've reasoned through other edge cases that need to be dealt
with and will layer other fixes in addition to this one.

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel