On Wed, Apr 13, 2016 at 09:51:04AM -0700, James Bottomley wrote: > On Wed, 2016-04-13 at 09:29 -0700, Bart Van Assche wrote: > > On 04/13/2016 09:21 AM, Martin K. Petersen wrote: > > > From a filesystem/ioctl perspective, BLKDISCARD is a hint. We > > > should not be > > > rounding off or aligning anything. > > > > Hello Martin, > > > > Today if a BLKDISCARD ioctl passes a non-aligned start and/or end > > sector to the kernel then the block layer will submit invalid (non > > -aligned) REQ_DISCARD requests to the block driver the ioctl applies > > to. This is not acceptable. Does the above mean that you are > > proposing to fail such BLKDISCARD ioctls with an error code? > > The answer would be of course not. discard is a hint so malformed > discard gets ignored by the device and success is returned because you > can't oblige devices to obey hints (that's why they're called hints). Agree. For blockdev FALLOC_FL_PUNCH_HOLE I think we can simply check for logical block size ("lbs") alignment and then pass the request to the device with the understanding that it can do as it pleases. We asked the device to try to deallocate blocks, and perhaps it cannot. Just to be clear, this only applies to zeroing discard; the "discard and who knows what you can now read back" thing that nobody likes has been temporarily wired up to FALLOC_FL_PUNCH_HOLE | FALLOC_FL_NO_HIDE_STALE. :) > However, the problem of needing a mandatory discard for scrubbing > blocks is part of the fallocate discussion, I think. The third fallocate mode (FALLOC_FL_ZERO_RANGE) doesn't fit with the phrase "mandatory discard for scrubbing blocks", though if one removed "discard" from that phrase then it would. The only thing that ZERO_RANGE guarantees is that subsequent reads return zeroes. XFS punches the entire range and reallocates it with unwritten extents; ext4 fills the holes in the range with unwritten extents and converts real extents to unwritten. Both also write zeroes to any part of the range that doesn't align to an FS block. Yes, I think there are several questions to resolve here for mandatory zeroing with FALLOC_FL_ZERO_RANGE (summarizing the issues I've come up with so far): a) Should blockdev fallocate accept byte-granular offset/length arguments, even if it has to use the page cache to write zeroes to the device? This is what file fallocate does today. b) If blockdev fallocate does impose alignment requirements, should it return EINVAL to a request that isn't aligned to the logical block size? c) If a device really really prefers that its requests are aligned to min_io_size (which can be much larger than the logical block size), should it reject requests that aren't aligned to min_io? Or perhaps it should take care of the alignment problems on its own somehow? For allocate mode (the thing Mike Snitzer brought up in another thread yesterday), the alignment problems are much easier because we're allowed to round the start down and the end up to fit whatever alignment we require. Should we promote this to a storage track session at LSF next week? --D > > James > > -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html