Thanks for reporting this. Much appreciated. More comments below. On Thu, Oct 17 2013 at 1:31pm -0400, Merla, ShivaKrishna <ShivaKrishna.Merla@xxxxxxxxxx> wrote: > Whenever multipath_dtr is happening, we should prevent queueing any further path > activation work. There was a kernel panic where after pg_init_done() decrements > pg_init_in_progress to 0, wait_for_pg_init_completion call assumes there are no > more pending path management commands. But if pg_init_required is set by > pg_init_done call due to retriable mode_select errors , then process_queued_ios() > will again queue the path activation work. If free_multipath call has been > completed by the time activate_path work is called, kernel panic was seen on > accessing multipath members. Your locking looks suspect to me, see comment inlined below multipath_dtr But shouldn't we just train multipath_wait_for_pg_init_completion() to look at m->pg_init_required? Have it wait for both pg_init_required and pg_init_in_progress to be zero? We'd also have to audit that pg_init_required cannot be set while pg_init_in_progress. > BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 > RIP: 0010:[<ffffffffa003db1b>] [<ffffffffa003db1b>] activate_path+0x1b/0x30 [dm_multipath] > [<ffffffff81090ac0>] worker_thread+0x170/0x2a0 > [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40 > > This patch will fix the issue by preventing any further path activations when > multipath structures are being freed during table suspend/reload. > > Signed-off-by: Shiva Krishna Merla<shivakrishna.merla@xxxxxxxxxx> > Reviewed-by: Krishnasamy Somasundaram<somasundaram.krishnasamy@xxxxxxxxxx> > Tested-by: Speagle Andy<Andy.Speagle@xxxxxxxxxx> > > --- > --- a/drivers/md/dm-mpath.c 2013-01-29 10:12:10.000000000 -0600 > +++ b/drivers/md/dm-mpath.c 2013-10-17 09:23:21.896062928 -0500 > @@ -73,6 +73,7 @@ struct multipath { > > wait_queue_head_t pg_init_wait; /* Wait for pg_init completion */ > > + unsigned dtr_in_progress; /* multipath destroy in progress */ > unsigned pg_init_required; /* pg_init needs calling? */ > unsigned pg_init_in_progress; /* Only one pg_init allowed at once */ > unsigned pg_init_delay_retry; /* Delay pg_init retry? */ > @@ -498,7 +499,8 @@ static void process_queued_ios(struct wo > (!pgpath && !m->queue_if_no_path)) > must_queue = 0; > > - if (m->pg_init_required && !m->pg_init_in_progress && pgpath) > + if (m->pg_init_required && !m->pg_init_in_progress && pgpath && > + !m->dtr_in_progress) > __pg_init_all_paths(m); > > spin_unlock_irqrestore(&m->lock, flags); > @@ -952,6 +954,7 @@ static void multipath_dtr(struct dm_targ > { > struct multipath *m = ti->private; > > + m->dtr_in_progress = 1; > flush_multipath_work(m); > free_multipath(m); > } Don't we need synchronization in multipath_dtr? Otherwise isn't there still potential for a narrow race when checking m->dtr_in_progress from process_queued_ios() or pg_init_limit_reached()? Anyway, my concerns should be moot if multipath_wait_for_pg_init_completion() is updated to look at pg_init_required. But I could be missing something. Mikulas (or Hannes or NEC guys), would welcome your take on this. > @@ -1164,7 +1167,7 @@ static int pg_init_limit_reached(struct > > spin_lock_irqsave(&m->lock, flags); > > - if (m->pg_init_count <= m->pg_init_retries) > + if (m->pg_init_count <= m->pg_init_retries && !m->dtr_in_progress) > m->pg_init_required = 1; > else > limit_reached = 1; > -- -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel