Re: [PATCH 2/2] virtio-blk: set NUMA affinity for a tagset

Leon Romanovsky <leon@xxxxxxxxxx> · Mon, 27 Sep 2021 21:23:28 +0300

On Mon, Sep 27, 2021 at 08:25:09PM +0300, Max Gurtovoy wrote:
> 
> On 9/27/2021 2:34 PM, Leon Romanovsky wrote:
> > On Sun, Sep 26, 2021 at 05:55:18PM +0300, Max Gurtovoy wrote:
> > > To optimize performance, set the affinity of the block device tagset
> > > according to the virtio device affinity.
> > > 
> > > Signed-off-by: Max Gurtovoy <mgurtovoy@xxxxxxxxxx>
> > > ---
> > >   drivers/block/virtio_blk.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 9b3bd083b411..1c68c3e0ebf9 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -774,7 +774,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> > >   	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> > >   	vblk->tag_set.ops = &virtio_mq_ops;
> > >   	vblk->tag_set.queue_depth = queue_depth;
> > > -	vblk->tag_set.numa_node = NUMA_NO_NODE;
> > > +	vblk->tag_set.numa_node = virtio_dev_to_node(vdev);
> > I afraid that by doing it, you will increase chances to see OOM, because
> > in NUMA_NO_NODE, MM will try allocate memory in whole system, while in
> > the latter mode only on specific NUMA which can be depleted.
> 
> This is a common methodology we use in the block layer and in NVMe subsystem
> and we don't afraid of the OOM issue you raised.

There are many reasons for that, but we are talking about virtio here
and not about NVMe.

> 
> This is not new and I guess that the kernel MM will (or should) be handling
> the fallback you raised.

I afraid that it is not. Can you point me to the place where such
fallback is implemented?

> 
> Anyway, if we're doing this in NVMe I don't see a reason to afraid doing it
> in virtio-blk.

Still, it is nice to have some empirical data to support this copy/paste.

There are too many myths related to optimizations, so finally it will be
good to get some supportive data.

Thanks