On Wed, Jan 26 2011 at 5:03am -0500, Tejun Heo <tj@xxxxxxxxxx> wrote: > > diff --git a/block/blk-core.c b/block/blk-core.c > > index 72dd23b..f507888 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -764,7 +764,7 @@ static struct request *get_request(struct request_queue *q, int rw_flags, > > struct request_list *rl = &q->rq; > > struct io_context *ioc = NULL; > > const bool is_sync = rw_is_sync(rw_flags) != 0; > > - int may_queue, priv; > > + int may_queue, priv = 0; > > > > may_queue = elv_may_queue(q, rw_flags); > > if (may_queue == ELV_MQUEUE_NO) > > @@ -808,9 +808,14 @@ static struct request *get_request(struct request_queue *q, int rw_flags, > > rl->count[is_sync]++; > > rl->starved[is_sync] = 0; > > > > - priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags); > > - if (priv) > > - rl->elvpriv++; > > + /* > > + * Skip elevator initialization for flush requests > > + */ > > + if (!(bio && (bio->bi_rw & (REQ_FLUSH | REQ_FUA)))) { > > + priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags); > > + if (priv) > > + rl->elvpriv++; > > + } > > I thought about doing it this way but I think we're burying the > REQ_FLUSH|REQ_FUA test logic too deep. get_request() shouldn't > "magically" know not to allocate elevator data. There is already a considerable amount of REQ_FLUSH|REQ_FUA special casing magic sprinkled though-out the block layer. Why is this get_request() change the case that goes too far? > The decision should > be made higher in the stack and passed down to get_request(). e.g. if > REQ_SORTED is set in @rw, elevator data is allocated; otherwise, not. Considering REQ_SORTED is set in elv_insert(), well after get_request() is called, I'm not seeing what you're suggesting. Anyway, I agree that ideally we'd have a mechanism to explicitly short-circuit elevator initialization. But doing so in a meaningful way would likely require a fair amount of refactoring of get_request* and its callers. I'll come back to this and have another look but my gut is this interface churn wouldn't _really_ help -- all things considered. > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > > index 8a082a5..0c569ec 100644 > > --- a/include/linux/blkdev.h > > +++ b/include/linux/blkdev.h > > @@ -99,25 +99,29 @@ struct request { > > /* > > * The rb_node is only used inside the io scheduler, requests > > * are pruned when moved to the dispatch queue. So let the > > - * flush fields share space with the rb_node. > > + * completion_data share space with the rb_node. > > */ > > union { > > struct rb_node rb_node; /* sort/lookup */ > > - struct { > > - unsigned int seq; > > - struct list_head list; > > - } flush; > > + void *completion_data; > > }; > > > > - void *completion_data; > > - > > /* > > * Three pointers are available for the IO schedulers, if they need > > - * more they have to dynamically allocate it. > > + * more they have to dynamically allocate it. Let the flush fields > > + * share space with these three pointers. > > */ > > - void *elevator_private; > > - void *elevator_private2; > > - void *elevator_private3; > > + union { > > + struct { > > + void *private; > > + void *private2; > > + void *private3; > > + } elevator; > > + struct { > > + unsigned int seq; > > + struct list_head list; > > + } flush; > > + }; > > Another thing is, can we please make private* an array? The number > postfixes are irksome. It's even one based instead of zero! Sure, I can sort that out. > > Also, it would be great to better describe the lifetime difference > > between the first and the second unions and why it has be organized > > this way (rb_node and completion_data can live together but rb_node > > and flush can't). > > Oops, what can't live together are elevator_private* and > completion_data. I'll better describe the 2nd union's sharing in the next revision. Mike -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html