On 11/12/2013 08:48 AM, Junichi Nomura wrote: > On 11/08/13 18:02, Hannes Reinecke wrote: >> There is no reason why multipath needs to queue requests >> internally for queue_if_no_path or pg_init; we should >> rather push them back onto the request queue. >> >> >> This patch removes the internal queueing mechanism from dm-multipath. > > Hi Hannes, > thanks for working on this. > >> In doing so we can also remove the ->busy check as a requeue is identical >> to ->busy returning 'true' from the callers perspective. This simplifies >> the code by quite a lot. > > They are not identical. > ->busy returns true when underlying path cannot dispatch a request > immediately. In that case it's better to keep the request in queue > to allow merges. (It used to have real impact on buffered sequential > write + fsync workload, though the performance impact might become > smaller in recent kernels due to plugging change.) > Also ->busy can be used by upper layer (dm_table_any_busy_target). > So you can't just remove it. > But they are identical from the callers perspective: drivers/md/dm.c:dm_request_fn() if (ti->type->busy && ti->type->busy(ti)) goto delay_and_out; clone = dm_start_request(md, rq); spin_unlock(q->queue_lock); if (map_request(ti, clone, md)) goto requeued; BUG_ON(!irqs_disabled()); spin_lock(q->queue_lock); } goto out; requeued: BUG_ON(!irqs_disabled()); spin_lock(q->queue_lock); delay_and_out: blk_delay_queue(q, HZ / 10); So the only difference between ->busy and return 'DM_MAPIO_REQUEUE' is that 'busy' runs under the queue_lock. And yes, in theory ->busy might be used from any upper layer; but so far the only caller I've found _is_ in dm_request_fn(). So removing doesn't do any harm. Unless I've misread something ... >> @@ -385,21 +378,15 @@ static int map_io(struct multipath *m, struct request *clone, >> (!m->queue_io && (m->repeat_count && --m->repeat_count == 0))) >> __choose_pgpath(m, nr_bytes); >> >> - pgpath = m->current_pgpath; >> - >> - if (was_queued) >> - m->queue_size--; >> + if (m->current_pgpath && >> + m->pg_init_required && !m->pg_init_in_progress) >> + __pg_init_all_paths(m); >> >> + pgpath = m->current_pgpath; >> if ((pgpath && m->queue_io) || >> (!pgpath && m->queue_if_no_path)) { >> - /* Queue for the daemon to resubmit */ >> - list_add_tail(&clone->queuelist, &m->queued_ios); >> - m->queue_size++; >> - if ((m->pg_init_required && !m->pg_init_in_progress) || >> - !m->queue_io) >> - queue_work(kmultipathd, &m->process_queued_ios); >> pgpath = NULL; >> - r = DM_MAPIO_SUBMITTED; >> + r = DM_MAPIO_REQUEUE; > > if/else sequence in map_io() might be easier to read if we do: > > if (pgpath) { > if (pg_ready(m)) { > ... // remap > r = DM_MAPIO_REMAPPED; > } else { > __pg_init_all_paths(m); > r = DM_MAPIO_REQUEUE; > } > } else { /* no path */ > if (need_requeue(m)) > r = DM_MAPIO_REQUEUE; > else > r = -EIO; > } > > where: > #define pg_ready(m) (!m->queue_io) /* or rename 'queue_io' */ > #define need_requeue(m) (m->queue_if_no_path || __must_push_back(m)) > > and move pg_init_required, etc. checks to __pg_init_all_paths(). > True. Will be doing that. >> @@ -1241,7 +1168,8 @@ static void pg_init_done(void *data, int errors) >> m->queue_io = 0; >> >> m->pg_init_delay_retry = delay_retry; >> - queue_work(kmultipathd, &m->process_queued_ios); >> + if (!m->queue_io) >> + dm_table_run_queue(m->ti->table); >> >> /* >> * Wake up any thread waiting to suspend. > > process_queued_ios was responsible for retrying pg_init. > And when retrying, m->queue_io is still 0. > So don't we have to run queue unconditionally here > or call __pg_init_all_paths() directly? > In my rework I've _tried_ to separate both functions from process_queued_ios(). But yes, you are right; I haven't considered pg_init_retry. Will be updating the patch. > >> diff --git a/drivers/md/dm.c b/drivers/md/dm.c >> index b3e26c7..20a19bd 100644 >> --- a/drivers/md/dm.c >> +++ b/drivers/md/dm.c >> @@ -1876,6 +1876,19 @@ static int dm_any_congested(void *congested_data, int bdi_bits) >> return r; >> } >> >> +void dm_table_run_queue(struct dm_table *t) >> +{ >> + struct mapped_device *md = dm_table_get_md(t); >> + unsigned long flags; >> + >> + if (md->queue) { >> + spin_lock_irqsave(md->queue->queue_lock, flags); >> + blk_run_queue_async(md->queue); >> + spin_unlock_irqrestore(md->queue->queue_lock, flags); >> + } >> +} >> +EXPORT_SYMBOL_GPL(dm_table_run_queue); > > Shouldn't this be in dm-table.c? > It would be logical. But as 'struct mapped_device' is internal to dm.c you would then end-up with something like: void dm_run_queue(struct mapped_device *md) and I would need to call it like dm_run_queue(dm_table_get_md()) which I felt was a bit pointless, as this would be the _only_ valid calling sequence. Hence I moved the call to dm_table_get_md() into the function, even though it meant a possible layering violation. Oh, the joys of device-mapper ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel