Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset

Max Gurtovoy <mgurtovoy@xxxxxxxxxx> · Mon, 27 Sep 2021 20:25:09 +0300

On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
To optimize performance, set the affinity of the block device tagset
according to the virtio device affinity.

Signed-off-by: Max Gurtovoy <mgurtovoy@xxxxxxxxxx>
---
  drivers/block/virtio_blk.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 9b3bd083b411..1c68c3e0ebf9 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
  	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
  	vblk->tag_set.ops = &virtio_mq_ops;
  	vblk->tag_set.queue_depth = queue_depth;
-	vblk->tag_set.numa_node = NUMA_NO_NODE;
+	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
I afraid that by doing it, you will increase chances to see OOM, because
in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
the latter mode only on specific NUMA which can be depleted.

This is a common methodology we use in the block layer and in NVMe 
subsystem and we don't afraid of the OOM issue you raised.

This is not new and I guess that the kernel MM will (or should) be 
handling the fallback you raised.

Anyway, if we're doing this in NVMe I don't see a reason to afraid doing 
it in virtio-blk.

Also, I've send a patch that decrease the size of the memory consumption 
for virtio-blk few weeks ago so I guess we'll be just fine.


Thanks

  	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
  	vblk->tag_set.cmd_size =
  		sizeof(struct virtblk_req) +
--
2.18.1