On Tue, Aug 31, 2021 at 02:35:00AM +0300, Max Gurtovoy wrote: > No need to pre-allocate a big buffer for the IO SGL anymore. If a device > has lots of deep queues, preallocation for the sg list can consume > substantial amounts of memory. For HW virtio-blk device, nr_hw_queues > can be 64 or 128 and each queue's depth might be 128. This means the > resulting preallocation for the data SGLs is big. > > Switch to runtime allocation for SGL for lists longer than 2 entries. > This is the approach used by NVMe drivers so it should be reasonable for > virtio block as well. Runtime SGL allocation has always been the case > for the legacy I/O path so this is nothing new. > > The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't > support SG_CHAIN, use only runtime allocation for the SGL. > > Signed-off-by: Max Gurtovoy <mgurtovoy@xxxxxxxxxx> > Reviewed-by: Israel Rukshin <israelr@xxxxxxxxxx> > --- > drivers/block/virtio_blk.c | 37 ++++++++++++++++++++++--------------- > 1 file changed, 22 insertions(+), 15 deletions(-) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 77e8468e8593..9a4c5d428b58 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -24,6 +24,12 @@ > /* The maximum number of sg elements that fit into a virtqueue */ > #define VIRTIO_BLK_MAX_SG_ELEMS 32768 > > +#ifdef CONFIG_ARCH_NO_SG_CHAIN > +#define VIRTIO_BLK_INLINE_SG_CNT 0 > +#else > +#define VIRTIO_BLK_INLINE_SG_CNT 2 > +#endif > + > static int virtblk_queue_count_set(const char *val, > const struct kernel_param *kp) > { > @@ -99,7 +105,7 @@ struct virtio_blk { > struct virtblk_req { > struct virtio_blk_outhdr out_hdr; > u8 status; > - struct scatterlist sg[]; > + struct sg_table sg_table; Please keep the sg flexible array member here instead of the pointer arithmetics that is added instead below. > + err = sg_alloc_table_chained(&vbr->sg_table, > + blk_rq_nr_phys_segments(req), > + vbr->sg_table.sgl, > + VIRTIO_BLK_INLINE_SG_CNT); > + if (err) > + return BLK_STS_RESOURCE; > + This will BUG() for requests without segments (fush and discard). You probably want a separate helper to actually map data in the, extending the big switch on the op. While we're at it, the blk_mq_start_request should also move as close as possible to the actual sending of the request to the host. You'll also need to select SG_POOL now that you're using these functions.