On Tue, Sep 20, 2016 at 01:44:36PM +0200, Alexander Gordeev wrote: > On Mon, Sep 19, 2016 at 10:48:49AM -0700, Omar Sandoval wrote: > > On Sun, Sep 18, 2016 at 09:37:14AM +0200, Alexander Gordeev wrote: > > > Currently maximum number of used hardware queues is limited to > > > number of CPUs in the system. However, using 'nr_cpu_ids' as > > > the limit for (de-)allocations of data structures instead of > > > existing data structures' counters (a) worsens readability and > > > (b) leads to unused memory when number of hardware queues is > > > less than number of CPUs. > > > > > > CC: linux-block@xxxxxxxxxxxxxxx > > > Signed-off-by: Alexander Gordeev <agordeev@xxxxxxxxxx> > > > --- > > > block/blk-mq.c | 8 ++++---- > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > > index 276ec7b..2c77b68 100644 > > > --- a/block/blk-mq.c > > > +++ b/block/blk-mq.c > > > @@ -2054,8 +2054,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, > > > if (!q->queue_ctx) > > > goto err_exit; > > > > > > - q->queue_hw_ctx = kzalloc_node(nr_cpu_ids * sizeof(*(q->queue_hw_ctx)), > > > - GFP_KERNEL, set->numa_node); > > > + q->queue_hw_ctx = kzalloc_node(set->nr_hw_queues * > > > + sizeof(*(q->queue_hw_ctx)), GFP_KERNEL, set->numa_node); > > > if (!q->queue_hw_ctx) > > > goto err_percpu; > > > > > > @@ -2319,7 +2319,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set) > > > if (set->nr_hw_queues > nr_cpu_ids) > > > set->nr_hw_queues = nr_cpu_ids; > > > > > > - set->tags = kzalloc_node(nr_cpu_ids * sizeof(struct blk_mq_tags *), > > > + set->tags = kzalloc_node(set->nr_hw_queues * sizeof(*set->tags), > > > GFP_KERNEL, set->numa_node); > > > if (!set->tags) > > > return -ENOMEM; > > > @@ -2360,7 +2360,7 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set) > > > { > > > int i; > > > > > > - for (i = 0; i < nr_cpu_ids; i++) { > > > + for (i = 0; i < set->nr_hw_queues; i++) { > > > if (set->tags[i]) > > > blk_mq_free_rq_map(set, set->tags[i], i); > > > } > > > > I don't think this is safe since we might increase the number of > > hardware queues (blk_mq_update_nr_hw_queues()). > > It is safe, because blk_mq_update_nr_hw_queues() limits number of > hardware queues to nr_cpu_ids. But still using nr_cpu_ids is wrong, > because (a) set->nr_hw_queues could be less than nr_cpu_ids and > (b) it is set->nr_hw_queues counter that tracks size of the array. But say we originally call blk_mq_init_allocated_queue() with set->nr_hw_queues = 1 and later we call blk_mq_update_nr_hw_queues() with nr_hw_queues = nr_cpu_ids (say, 4). Then we update (line numbers from v4.8-rc7): 2406 set->nr_hw_queues = nr_hw_queues; So, when we call blk_mq_realloc_hw_ctxs(), we do: 1998 for (i = 0; i < set->nr_hw_queues; i++) { 1999 int node; 2000 2001 if (hctxs[i]) 2002 continue; 2003 2004 node = blk_mq_hw_queue_to_node(q->mq_map, i); 2005 hctxs[i] = kzalloc_node(sizeof(struct blk_mq_hw_ctx), 2006 GFP_KERNEL, node); Now set->nr_hw_queues = 4 but we only allocated 1 hardware queue, so we'll scribble off the end of the array. Similar problem in blk_mq_map_swqueue(): 1850 queue_for_each_hw_ctx(q, hctx, i) { 1851 struct blk_mq_ctxmap *map = &hctx->ctx_map; 1852 1853 /* 1854 * If no software queues are mapped to this hardware queue, 1855 * disable it and free the request entries. 1856 */ 1857 if (!hctx->nr_ctx) { 1858 if (set->tags[i]) { 1859 blk_mq_free_rq_map(set, set->tags[i], i); 1860 set->tags[i] = NULL; 1861 } 1862 hctx->tags = NULL; 1863 continue; 1864 } 1865 1866 /* unmapped hw queue can be remapped after CPU topo changed */ 1867 if (!set->tags[i]) 1868 set->tags[i] = blk_mq_init_rq_map(set, i); q->nr_hw_queues would now be 4, but we only allocated one element for set->tags. Allocating nr_cpu_ids upfront is the easiest way to handle adding hardware queues up to nr_cpu_ids as needed. Both of these are just arrays of pointers, so it's not like it's a huge waste of memory. -- Omar -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html