On 16-04-13 01:30 PM, Darrick J. Wong wrote:
On Wed, Apr 13, 2016 at 09:51:04AM -0700, James Bottomley wrote:
On Wed, 2016-04-13 at 09:29 -0700, Bart Van Assche wrote:
On 04/13/2016 09:21 AM, Martin K. Petersen wrote:
From a filesystem/ioctl perspective, BLKDISCARD is a hint. We
should not be
rounding off or aligning anything.
Hello Martin,
Today if a BLKDISCARD ioctl passes a non-aligned start and/or end
sector to the kernel then the block layer will submit invalid (non
-aligned) REQ_DISCARD requests to the block driver the ioctl applies
to. This is not acceptable. Does the above mean that you are
proposing to fail such BLKDISCARD ioctls with an error code?
The answer would be of course not. discard is a hint so malformed
discard gets ignored by the device and success is returned because you
can't oblige devices to obey hints (that's why they're called hints).
Agree. For blockdev FALLOC_FL_PUNCH_HOLE I think we can simply check for
logical block size ("lbs") alignment and then pass the request to the
device with the understanding that it can do as it pleases. We asked the
device to try to deallocate blocks, and perhaps it cannot.
Just to be clear, this only applies to zeroing discard; the "discard and who
knows what you can now read back" thing that nobody likes has been temporarily
wired up to FALLOC_FL_PUNCH_HOLE | FALLOC_FL_NO_HIDE_STALE. :)
In May last year, T10 added another wrinkle when they expanded the LBPRZ
field from 1 to 3 bits (in the LBP VPD page but _not_ in the READ
CAPACITY(16) response). The expansion is to allow a new response when
an unmapped logical block is read: return a "provisioning initialization
pattern". That new piece of jargon is defined as a "non-zero pattern that
is the length of one logical block".
It seems that the "provisioning initialization pattern" is the same for
every unmapped logical block and is chosen by the manufacturer. It can
be read with the new REPORT PROVISIONING INITIALIZATION PATTERN command.
If LBPRZ=2 and FORMAT UNIT is called with an "initialization pattern"
equal to the disk's "provisioning initialization pattern" then all
logical blocks are unmapped. Clear?
Doug Gilbert
However, the problem of needing a mandatory discard for scrubbing
blocks is part of the fallocate discussion, I think.
The third fallocate mode (FALLOC_FL_ZERO_RANGE) doesn't fit with the phrase
"mandatory discard for scrubbing blocks", though if one removed "discard" from
that phrase then it would. The only thing that ZERO_RANGE guarantees is that
subsequent reads return zeroes. XFS punches the entire range and reallocates
it with unwritten extents; ext4 fills the holes in the range with unwritten
extents and converts real extents to unwritten. Both also write zeroes to any
part of the range that doesn't align to an FS block.
Yes, I think there are several questions to resolve here for mandatory zeroing
with FALLOC_FL_ZERO_RANGE (summarizing the issues I've come up with so far):
a) Should blockdev fallocate accept byte-granular offset/length arguments, even
if it has to use the page cache to write zeroes to the device? This is what
file fallocate does today.
b) If blockdev fallocate does impose alignment requirements, should it return
EINVAL to a request that isn't aligned to the logical block size?
c) If a device really really prefers that its requests are aligned to
min_io_size (which can be much larger than the logical block size), should it
reject requests that aren't aligned to min_io? Or perhaps it should take care
of the alignment problems on its own somehow?
For allocate mode (the thing Mike Snitzer brought up in another thread
yesterday), the alignment problems are much easier because we're allowed to
round the start down and the end up to fit whatever alignment we require.
Should we promote this to a storage track session at LSF next week?
--D
James
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html