Re: [PATCH mdadm v2 0/2] Discard Option for Creating Arrays

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jonmichael,

> are there capabilities of REQ_OP_WRITE_ZEROES for detection of NVMe
> DLFEAT in the identify namespace information? The purpose of this
> capability is for operating systems to detect it, precisely for use
> cases like we have identified where deterministic read zero is
> required to save a tremendous amount of time and NAND endurance.

I don't believe DEAC/DLFEAT are currently wired up in the NVMe driver
but it would be trivial to match what SCSI does in that department.

The intent of the REQ_OP_WRITE_ZEROES interface is to provide the choice
between deallocate semantics (think discard) and allocate semantics
(think write same) for zeroing. See the BLKDEV_ZERO_NOUNMAP flag for
more info.

The important distinction between REQ_OP_DISCARD and REQ_OP_WRITE_ZEROES
is that the latter is a data integrity operation that produces
deterministic results. I.e. guarantees that all blocks will return
zeroes on subsequent reads. Whereas REQ_OP_DISCARD is a hint that can
and often will skip portions of the request sent.

It was a mistake to conflate deallocation and zeroing in our initial
implementation of discards in Linux. We have painstakingly removed that
and now provide two distinct interfaces: REQ_OP_DISCARD tells a device
that a block range is no longer in use, we don't care about block
contents for future reads. Whereas REQ_OP_WRITE_ZEROES aims to provide
an optimal interface for clearing block ranges given the reported
characteristics of a given device.

Note that I am careful about using REQ_OP_DISCARD and
REQ_OP_WRITE_ZEROES terminology to describe the block layer primitives
for deallocating and zeroing block ranges here. At the bottom of the
stack, a REQ_OP_WRITE_ZEROES operation could very well end up issuing
what people would think of as a "discard" operation (DSM TRIM, WRITE
SAME w/UNMAP) assuming the device has been identified as doing the right
thing.

Anything operating at the block device level should be using the
REQ_OP_DISCARD/REQ_OP_WRITE_ZEROES primitives (or their corresponding
ioctls or fallocate flags). And if there is a need to address how those
primitives are translated into commands for a given device, then we
should handle that in the relevant device driver.

-- 
Martin K. Petersen	Oracle Linux Engineering



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux