On Thu, Oct 26, 2017 at 12:32:17PM -0600, Keith Busch wrote: > On Thu, Oct 26, 2017 at 01:01:29PM -0500, Eric Sandeen wrote: > > On 10/26/17 12:49 PM, Eric Sandeen wrote: > > > Yeah, lots of devices are unhappy with large discards. And yeah, in the > > > end I think this papers over a kernel and/or hardware problem. > > > > > > But sometimes we do that, if only to keep things working reasonably > > > well with older kernels or hardware that'll never get fixed... > > > > > > (TBH sometimes I regret putting mkfs-time discard in by default in the > > > first place.) > > > > I think I left this on a too-positive note. It seems pretty clear that there > > is no way to fix all of userspace to not issue "too big" discards, when > > "too big" isn't even well-defined, or specified by anything at all. > > Yeah, I totally get this proposal is just a bandaid, and other user > space programs may suffer when used with devices behaving this way. XFS > is just very popular, so it's frequently reported as problematic against > large capacity devices. Sure, but now you have to go fix mke2fs and everything /else/ that issues BLKDISCARD (or FALLOC_FL_PUNCH) on a large file / device, and until you fix every program to work around this weird thing in the kernel there'll still be someone somewhere with this timeout problem... ...so I started digging into what the kernel does with a BLKDISCARD request, which is to say that I looked at blkdev_issue_discard. That function uses blk_*_plug() to wrap __blkdev_issue_discard, which in turn splits the request into a chain of UINT_MAX-sized struct bios. 128G's worth of 4G ios == 32 chained bios. 2T worth of 4G ios == 512 chained bios. So now I'm wondering, is the problem more that the first bio in the chain times out because the last one hasn't finished yet, so the whole thing gets aborted because we chained too much work together? Would it make sense to fix __blkdev_issue_discard to chain fewer bios together? Or just issue the bios independently and track the completions individually? > > I'm not wise in the ways of queueing and throttling, but from my naiive > > perspective, it seems like something to be fixed in the kernel, or if it > > can't, export some new "maximum discard request size" which can be trusted? I would've thought that's what max_discard_sectors was for, but... eh. How big is the device you were trying to mkfs, anyway? --D > The problem isn't really that a discard sent to the device was "too > big". It's that "too many" are issued at the same time, and there isn't > a way for a driver to limit the number of outstanding discards without > affecting read/write. > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html