Re: [PATCH V2 2/3] block: virtio_blk: fix handling single range discard request

Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> · Wed, 12 Aug 2020 14:38:03 +0800

On Wed, Aug 12, 2020 at 10:52:58AM +0800, Ming Lei wrote:
> On Wed, Aug 12, 2020 at 10:07:06AM +0800, Baolin Wang wrote:
> > Hi Ming,
> > 
> > On Wed, Aug 12, 2020 at 07:44:19AM +0800, Ming Lei wrote:
> > > 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") starts
> > > to support multi-range discard for virtio-blk. However, the virtio-blk
> > > disk may report max discard segment as 1, at least that is exactly what
> > > qemu is doing.
> > > 
> > > So far, block layer switches to normal request merge if max discard segment
> > > limit is 1, and multiple bios can be merged to single segment. This way may
> > > cause memory corruption in virtblk_setup_discard_write_zeroes().
> > > 
> > > Fix the issue by handling single max discard segment in straightforward
> > > way.
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > > Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
> > > Cc: Christoph Hellwig <hch@xxxxxx>
> > > Cc: Changpeng Liu <changpeng.liu@xxxxxxxxx>
> > > Cc: Daniel Verkamp <dverkamp@xxxxxxxxxxxx>
> > > Cc: Michael S. Tsirkin <mst@xxxxxxxxxx>
> > > Cc: Stefan Hajnoczi <stefanha@xxxxxxxxxx>
> > > Cc: Stefano Garzarella <sgarzare@xxxxxxxxxx>
> > > ---
> > >  drivers/block/virtio_blk.c | 31 +++++++++++++++++++++++--------
> > >  1 file changed, 23 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > > index 63b213e00b37..b2e48dac1ebd 100644
> > > --- a/drivers/block/virtio_blk.c
> > > +++ b/drivers/block/virtio_blk.c
> > > @@ -126,16 +126,31 @@ static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap)
> > >  	if (!range)
> > >  		return -ENOMEM;
> > >  
> > > -	__rq_for_each_bio(bio, req) {
> > > -		u64 sector = bio->bi_iter.bi_sector;
> > > -		u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT;
> > > -
> > > -		range[n].flags = cpu_to_le32(flags);
> > > -		range[n].num_sectors = cpu_to_le32(num_sectors);
> > > -		range[n].sector = cpu_to_le64(sector);
> > > -		n++;
> > > +	/*
> > > +	 * Single max discard segment means multi-range discard isn't
> > > +	 * supported, and block layer only runs contiguity merge like
> > > +	 * normal RW request. So we can't reply on bio for retrieving
> > > +	 * each range info.
> > > +	 */
> > > +	if (queue_max_discard_segments(req->q) == 1) {
> > > +		range[0].flags = cpu_to_le32(flags);
> > > +		range[0].num_sectors = cpu_to_le32(blk_rq_sectors(req));
> > > +		range[0].sector = cpu_to_le64(blk_rq_pos(req));
> > > +		n = 1;
> > > +	} else {
> > > +		__rq_for_each_bio(bio, req) {
> > > +			u64 sector = bio->bi_iter.bi_sector;
> > > +			u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT;
> > > +
> > > +			range[n].flags = cpu_to_le32(flags);
> > > +			range[n].num_sectors = cpu_to_le32(num_sectors);
> > > +			range[n].sector = cpu_to_le64(sector);
> > > +			n++;
> > > +		}
> > >  	}
> > >  
> > > +	WARN_ON_ONCE(n != segments);
> > 
> > I wonder should we return an error if the discard segments are
> > incorrect like NVMe did[1]? In case the DMA may do some serious
> > damages in this case.
> 
> It is an unlikely case:
> 
> 1) if queue_max_discard_segments() is 1, the warning can't be triggered
> 
> 2) otherwise, ELEVATOR_DISCARD_MERGE is always handled in bio_attempt_discard_merge(),
> and segment number is really same with number of bios in the request.
> 
> If the warning is triggered, it is simply one serious bug in block
> layer.
> 
> BTW, suppose the warning is triggered:
> 
> 1) if n < segments, it is simply one warning
> 
> 2) if n > segments, no matter if something like nvme_setup_discard() is
> done, serious memory corruption issue has been caused.
> 
> So it doesn't matter to handle it in nvme's style.

OK. Sounds reasonable. Thanks.