On 11/12/2013 09:17 AM, Hannes Reinecke wrote: > On 11/12/2013 08:48 AM, Junichi Nomura wrote: >> On 11/08/13 18:02, Hannes Reinecke wrote: >>> There is no reason why multipath needs to queue requests >>> internally for queue_if_no_path or pg_init; we should >>> rather push them back onto the request queue. >>> >>> >>> This patch removes the internal queueing mechanism from dm-multipath. >> >> Hi Hannes, >> thanks for working on this. >> >>> In doing so we can also remove the ->busy check as a requeue is identical >>> to ->busy returning 'true' from the callers perspective. This simplifies >>> the code by quite a lot. >> >> They are not identical. >> ->busy returns true when underlying path cannot dispatch a request >> immediately. In that case it's better to keep the request in queue >> to allow merges. (It used to have real impact on buffered sequential >> write + fsync workload, though the performance impact might become >> smaller in recent kernels due to plugging change.) >> Also ->busy can be used by upper layer (dm_table_any_busy_target). >> So you can't just remove it. >> > But they are identical from the callers perspective: > drivers/md/dm.c:dm_request_fn() > > if (ti->type->busy && ti->type->busy(ti)) > goto delay_and_out; > > clone = dm_start_request(md, rq); > > spin_unlock(q->queue_lock); > if (map_request(ti, clone, md)) > goto requeued; > > BUG_ON(!irqs_disabled()); > spin_lock(q->queue_lock); > } > > goto out; > > requeued: > BUG_ON(!irqs_disabled()); > spin_lock(q->queue_lock); > > delay_and_out: > blk_delay_queue(q, HZ / 10); > > So the only difference between ->busy and return 'DM_MAPIO_REQUEUE' > is that 'busy' runs under the queue_lock. > > And yes, in theory ->busy might be used from any upper layer; but > so far the only caller I've found _is_ in dm_request_fn(). > So removing doesn't do any harm. > > Unless I've misread something ... > >>> @@ -385,21 +378,15 @@ static int map_io(struct multipath *m, struct request *clone, >>> (!m->queue_io && (m->repeat_count && --m->repeat_count == 0))) >>> __choose_pgpath(m, nr_bytes); >>> >>> - pgpath = m->current_pgpath; >>> - >>> - if (was_queued) >>> - m->queue_size--; >>> + if (m->current_pgpath && >>> + m->pg_init_required && !m->pg_init_in_progress) >>> + __pg_init_all_paths(m); >>> >>> + pgpath = m->current_pgpath; >>> if ((pgpath && m->queue_io) || >>> (!pgpath && m->queue_if_no_path)) { >>> - /* Queue for the daemon to resubmit */ >>> - list_add_tail(&clone->queuelist, &m->queued_ios); >>> - m->queue_size++; >>> - if ((m->pg_init_required && !m->pg_init_in_progress) || >>> - !m->queue_io) >>> - queue_work(kmultipathd, &m->process_queued_ios); >>> pgpath = NULL; >>> - r = DM_MAPIO_SUBMITTED; >>> + r = DM_MAPIO_REQUEUE; >> >> if/else sequence in map_io() might be easier to read if we do: >> >> if (pgpath) { >> if (pg_ready(m)) { >> ... // remap >> r = DM_MAPIO_REMAPPED; >> } else { >> __pg_init_all_paths(m); >> r = DM_MAPIO_REQUEUE; >> } >> } else { /* no path */ >> if (need_requeue(m)) >> r = DM_MAPIO_REQUEUE; >> else >> r = -EIO; >> } >> >> where: >> #define pg_ready(m) (!m->queue_io) /* or rename 'queue_io' */ >> #define need_requeue(m) (m->queue_if_no_path || __must_push_back(m)) >> >> and move pg_init_required, etc. checks to __pg_init_all_paths(). >> > True. Will be doing that. > >>> @@ -1241,7 +1168,8 @@ static void pg_init_done(void *data, int errors) >>> m->queue_io = 0; >>> >>> m->pg_init_delay_retry = delay_retry; >>> - queue_work(kmultipathd, &m->process_queued_ios); >>> + if (!m->queue_io) >>> + dm_table_run_queue(m->ti->table); >>> >>> /* >>> * Wake up any thread waiting to suspend. >> >> process_queued_ios was responsible for retrying pg_init. >> And when retrying, m->queue_io is still 0. >> So don't we have to run queue unconditionally here >> or call __pg_init_all_paths() directly? >> > In my rework I've _tried_ to separate both functions from > process_queued_ios(). > But yes, you are right; I haven't considered pg_init_retry. > Will be updating the patch. > Actually, I have. I just had a closer look at the patch, and pg_init retry is handled, albeit differently than in the original. It now works like this: ->map_io() is called -> calls '__switch_pg', which sets 'queue_io' -> calls __pg_init_all_paths, which pushes activate_path onto a workqueue -> returns 'MAPIO_REQUEUE' -> pg_init_done() -> Checks pg_init_required -> if non-zero some other I/O already kicked off an 'activate_path', so nothing to be done from our side -> if zero we're calling kicking the queue via blk_run_queue And blk_run_queue() will be calling into ->request_fn, which will pull requests off the queue. So on the next request we're calling 'map_io', so the entire game starts anew, retrying pg_init. The only thing we're not handling properly is the 'pg_init_delay_retry', as for that we should've started the queue with a certain delay, which we currently don't. But that's easily fixable. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel