Re: [PATCH] xfsprogs: Issue smaller discards at mkfs

Eric Sandeen <sandeen@xxxxxxxxxxx> · Thu, 26 Oct 2017 12:49:53 -0500

On 10/26/17 11:25 AM, Darrick J. Wong wrote:
> On Thu, Oct 26, 2017 at 08:41:31AM -0600, Keith Busch wrote:
>> Running mkfs.xfs was discarding the entire capacity in a single range. The
>> block layer would split these into potentially many smaller requests
>> and dispatch all of them to the device at roughly the same time.
>>
>> SSD capacities are getting so large that full capacity discards will
>> take some time to complete. When discards are deeply queued, the block
>> layer may trigger timeout handling and IO failure, though the device is
>> operating normally.
>>
>> This patch uses smaller discard ranges in a loop for mkfs to avoid
>> risking such timeouts. The max discard range is arbitrarilly set to
>> 128GB in this patch.
> 
> I'd have thought devices would set sane blk_queue_max_discard_sectors
> so that the block layer doesn't send such a huge command that the kernel
> times out...
> 
> ...but then I actually went and grepped that in the kernel and
> discovered that nbd, zram, raid0, mtd, and nvme all pass in UINT_MAX,
> which is 2T.  Frighteningly xen-blkfront passes in get_capacity() (which
> overflows the unsigned int parameter on big virtual disks, I guess?).
> 
> (I still think this is the kernel's problem, not userspace's, but now
> with an extra layer of OMGWTF sprayed on.)
> 
> I dunno.  What kind of device produces these timeouts, and does it go
> away if max_discards is lowered?

Yeah, lots of devices are unhappy with large discards.  And yeah, in the
end I think this papers over a kernel and/or hardware problem.

But sometimes we do that, if only to keep things working reasonably
well with older kernels or hardware that'll never get fixed...

(TBH sometimes I regret putting mkfs-time discard in by default in the
first place.)

-Eric

> --D
> 
>> Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx>
>> ---
>>  include/linux.h | 13 ++++++++++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux.h b/include/linux.h
>> index 6ce344c5..702aee0c 100644
>> --- a/include/linux.h
>> +++ b/include/linux.h
>> @@ -132,10 +132,17 @@ static __inline__ void platform_uuid_copy(uuid_t *dst, uuid_t *src)
>>  static __inline__ int
>>  platform_discard_blocks(int fd, uint64_t start, uint64_t len)
>>  {
>> -	uint64_t range[2] = { start, len };
>> +	uint64_t end = start + len;
>> +	uint64_t size = 128ULL * 1024ULL * 1024ULL * 1024ULL;
>>  
>> -	if (ioctl(fd, BLKDISCARD, &range) < 0)
>> -		return errno;
>> +	for (; start < end; start += size) {
>> +		uint64_t range[2] = { start, MIN(len, size) };
>> +
>> +		len -= range[1];
>> +		if (ioctl(fd, BLKDISCARD, &range) < 0)
>> +			return errno;
>> +
>> +	}
>>  	return 0;
>>  }
>>  
>> -- 
>> 2.13.6
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html