RE: [PATCH v8] virtio_blk: add discard and write zeroes support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Daniel Verkamp [mailto:dverkamp@xxxxxxxxxxxx]
> Sent: Tuesday, October 16, 2018 7:16 AM
> To: hch@xxxxxxxxxxxxx
> Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-block@xxxxxxxxxxxxxxx;
> mst@xxxxxxxxxx; jasowang@xxxxxxxxxx; axboe@xxxxxxxxx;
> stefanha@xxxxxxxxxx; Liu, Changpeng <changpeng.liu@xxxxxxxxx>
> Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support
> 
> On Mon, Oct 15, 2018 at 2:27 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > > From: Changpeng Liu <changpeng.liu@xxxxxxxxx>
> > >
> > > In commit 88c85538, "virtio-blk: add discard and write zeroes features
> > > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
> >
> > There is some issues in this spec.  For one using the multiple ranges
> > also for write zeroes is rather inefficient.  Write zeroes really should
> > use the same format as read and write.
> 
> I wasn't involved in the writing of the spec, so I'll defer to Michael
> and Changpeng here, but I'm not sure how "set in stone" the virtio
> specification is, or if it can be updated somehow without breaking
> compatibility.
> 
> I agree that Write Zeroes would be simpler to implement as a single
> LBA + length rather than a list.  However, it's not really possible to
> use the same format as the regular virtio block read/write commands
> (VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT), since the read/write commands do
> not specify a length explicitly; length is implied by the length of
> the data buffer as defined by the virtio descriptor, but a Write
> Zeroes command does not require a data buffer.  At best, this could be
> a separate command mirroring the layout of struct virtio_blk_req but
> with data replaced with a length field; I'm not sure that buys much in
> the way of consistency.
Yeah, that's the consideration here.
> 
> > Second the unmap flag isn't properly specified at all, as nothing
> > says the device may not unmap without the unmap flag.  Please take
> > a look at the SCSI or NVMe ѕpec for some guidance.
> 
> This could probably use some clarifying text in the specification, but
> given that there is nothing in the spec describing what the device
> needs to do when unmap = 0, I would assume that the device can do
> whatever it likes, as long as the blocks read back as 0s afterwards.
> Reading back 0s is required by the definition of the Write Zeroes
> command in the same virtio spec change.  It would probably be good to
> clarify this and explicitly define what the device is allowed to do in
> response to both settings of the unmap bit.
> 
> My understanding of the corresponding feature in NVMe (the Deallocate
> bit in the Write Zeroes command) is that the only difference between
> Deallocate = 1 and 0 is that the device "should" versus "may" (no
> "shall" on either side) deallocate the corresponding blocks, but only
> if the device supports reading 0s back after blocks are deallocated.
> If the device does not support reading 0s after deallocation, it is
> not allowed to deallocate blocks as part of a Write Zeroes command
> regardless of the setting of the Deallocate bit.
> 
> Some similar wording could probably be added to the virtio spec to
> clarify the meaning of unmap, although I would prefer something that
> makes it a little clearer that the bit is only intended as a hint from
> the driver to indicate whether the device should attempt to keep
> storage allocated for the zeroed blocks, if that is indeed the
> intended behavior.
Yes, that's the original idea.  Adding a clear description to the specification may be better. 
> 
> Is there some in-kernel doc that describes what behavior the Linux
> block layer needs from a write zeroes command?
> 
> > > +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> > > +                                             bool unmap)
> >
> > Why is this an inline function?
> 
> I don't think there's any reason it needs to be inline; I can drop the
> inline in the next revision.
> 
> Given (as far as I can tell) your concerns seem to apply to the Write
> Zeroes command specifically, would it be reasonable to start with a
> patch that just adds support for the Discard command (along with fixes
> for Ming's feedback)?  This would be sufficient for my particular use
> case (although I can't speak for Changpeng), and we can revisit Write
> Zeroes once the spec concerns are worked out.
> 
> Thanks,
> -- Daniel




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux