On 2005-06-13T19:28:51, Lars Marowsky-Bree <lmb@xxxxxxx> wrote: > Hi Alasdair, Ed, List, > > this patch addresses the bug Ed pointed out. The trigger_event queue > might still hold a reference while dm_destroy kills the current mapping. > > However, this patch is unconvincing in the larger view of things. Yes, > barring bugs, it should address the problem, but the real issue is - > just like with still having pg_inits in-flight - that DM's reference > counting isn't taking into account what we are doing internally, and > that remains unsolved. > > Comments? Second (public) revision. The wait_on_work() was no good. While it considerably reduced the window for the race, it didn't entirely close it (the flag is cleared prior to actually calling the handler). The comment still remains: The layering in DM is sub-optimal, if not wrong. I tried exporting dm_(inc|dec)_pending(struct mapped_device *md) from dm.c, so as to be able to influence the reference counter for the dm device when queuing / completing an internal event, but this is a mess. Deep inside dm-mpath, I don't have a reference to that struct at all. This is actually the same problem which makes sane _logging_ in dm-mpath a nightmare: No references to the name of the DM dev etc. This all sucks(tm). However, I needed a fix reasonably soon (as in, right now ;-) which is somewhat easier to verify then revamping the full DM layering. Comments on this approach are appreciated. And yes, I know that the "looping" in free_multipath() sucks. See above for the reason: It's easier to verify then the full answer. A flush_workqueue(kmultipathd) is too large a hammer - given that other tables might still be busy queuing events, re-queuing IO etc, that could potentially block for much longer periods of time. From: Lars Marowsky-Bree <lmb@xxxxxxx> Subject: Panic in dm-mpath.c:trigger_event() References: 88635 If a dm mpath table was destroyed while events were still pending to be delivered, the code could panic in trigger_event() trying to dereference free'd memory. Index: linux-2.6.5/drivers/md/dm-mpath.c =================================================================== --- linux-2.6.5.orig/drivers/md/dm-mpath.c 2005-06-13 18:14:58.555830390 +0200 +++ linux-2.6.5/drivers/md/dm-mpath.c 2005-06-14 10:57:44.879248517 +0200 @@ -80,6 +80,8 @@ struct multipath { unsigned queue_size; struct work_struct trigger_event; + atomic_t trigger_event_pending; /* Reference count whether any trigger_event + is outstanding */ /* * We must use a mempool of mpath_io structs so that we @@ -179,6 +180,7 @@ static struct multipath *alloc_multipath m->queue_io = 1; INIT_WORK(&m->process_queued_ios, process_queued_ios, m); INIT_WORK(&m->trigger_event, trigger_event, m); + atomic_set(&m->trigger_event_pending, 0); m->mpio_pool = mempool_create(MIN_IOS, mempool_alloc_slab, mempool_free_slab, _mpio_cache); if (!m->mpio_pool) { @@ -195,6 +197,14 @@ static void free_multipath(struct multip struct priority_group *pg, *tmp; struct hw_handler *hwh = &m->hw_handler; + /* TODO - Yes, I know: + * It would be better if in fact pending trigger_events + * were accounted for in the struct mapped_device * we belong + * to. However, that is not exported to us, and that fix will + * require slightly more rearchitecting then this bit. */ + while (atomic_read(&m->trigger_event_pending)) + schedule(); + list_for_each_entry_safe (pg, tmp, &m->priority_groups, list) { list_del(&pg->list); free_priority_group(pg, m->ti); @@ -421,6 +431,16 @@ static void trigger_event(void *data) struct multipath *m = (struct multipath *) data; dm_table_event(m->ti->table); + + spin_lock(&m->lock); + atomic_set(&m->trigger_event_pending, 0); + spin_unlock(&m->lock); +} + +static void inline queue_trigger_event(struct multipath *m) +{ + atomic_set(&m->trigger_event_pending, 1); + queue_work(kmultipathd, &m->trigger_event); } /*----------------------------------------------------------------- @@ -818,7 +838,8 @@ static int fail_path(struct pgpath *pgpa if (pgpath == m->current_pgpath) m->current_pgpath = NULL; - queue_work(kmultipathd, &m->trigger_event); + + queue_trigger_event(m); out: spin_unlock_irqrestore(&m->lock, flags); @@ -857,7 +878,7 @@ static int reinstate_path(struct pgpath if (!m->nr_valid_paths++) queue_work(kmultipathd, &m->process_queued_ios); - queue_work(kmultipathd, &m->trigger_event); + queue_trigger_event(m); out: spin_unlock_irqrestore(&m->lock, flags); @@ -900,9 +921,10 @@ static void bypass_pg(struct multipath * m->current_pgpath = NULL; m->current_pg = NULL; + queue_trigger_event(m); + spin_unlock_irqrestore(&m->lock, flags); - queue_work(kmultipathd, &m->trigger_event); } /* @@ -930,9 +952,9 @@ static int switch_pg_num(struct multipat m->current_pg = NULL; m->next_pg = pg; } + queue_trigger_event(m); spin_unlock_irqrestore(&m->lock, flags); - queue_work(kmultipathd, &m->trigger_event); return 0; } Sincerely, Lars Marowsky-Brée <lmb@xxxxxxx> -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin "Ignorance more frequently begets confidence than does knowledge"