On 10/26/17 11:25 AM, Darrick J. Wong wrote: > On Thu, Oct 26, 2017 at 08:41:31AM -0600, Keith Busch wrote: >> Running mkfs.xfs was discarding the entire capacity in a single range. The >> block layer would split these into potentially many smaller requests >> and dispatch all of them to the device at roughly the same time. >> >> SSD capacities are getting so large that full capacity discards will >> take some time to complete. When discards are deeply queued, the block >> layer may trigger timeout handling and IO failure, though the device is >> operating normally. >> >> This patch uses smaller discard ranges in a loop for mkfs to avoid >> risking such timeouts. The max discard range is arbitrarilly set to >> 128GB in this patch. > > I'd have thought devices would set sane blk_queue_max_discard_sectors > so that the block layer doesn't send such a huge command that the kernel > times out... > > ...but then I actually went and grepped that in the kernel and > discovered that nbd, zram, raid0, mtd, and nvme all pass in UINT_MAX, > which is 2T. Frighteningly xen-blkfront passes in get_capacity() (which > overflows the unsigned int parameter on big virtual disks, I guess?). > > (I still think this is the kernel's problem, not userspace's, but now > with an extra layer of OMGWTF sprayed on.) > > I dunno. What kind of device produces these timeouts, and does it go > away if max_discards is lowered? Yeah, lots of devices are unhappy with large discards. And yeah, in the end I think this papers over a kernel and/or hardware problem. But sometimes we do that, if only to keep things working reasonably well with older kernels or hardware that'll never get fixed... (TBH sometimes I regret putting mkfs-time discard in by default in the first place.) -Eric > --D > >> Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx> >> --- >> include/linux.h | 13 ++++++++++--- >> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux.h b/include/linux.h >> index 6ce344c5..702aee0c 100644 >> --- a/include/linux.h >> +++ b/include/linux.h >> @@ -132,10 +132,17 @@ static __inline__ void platform_uuid_copy(uuid_t *dst, uuid_t *src) >> static __inline__ int >> platform_discard_blocks(int fd, uint64_t start, uint64_t len) >> { >> - uint64_t range[2] = { start, len }; >> + uint64_t end = start + len; >> + uint64_t size = 128ULL * 1024ULL * 1024ULL * 1024ULL; >> >> - if (ioctl(fd, BLKDISCARD, &range) < 0) >> - return errno; >> + for (; start < end; start += size) { >> + uint64_t range[2] = { start, MIN(len, size) }; >> + >> + len -= range[1]; >> + if (ioctl(fd, BLKDISCARD, &range) < 0) >> + return errno; >> + >> + } >> return 0; >> } >> >> -- >> 2.13.6 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html