Re: [RFC PATCH] block: xfs: dm thin: train XFS to give up on retrying IO if thinp is out of space

Mike Snitzer <snitzer@xxxxxxxxxx> · Thu, 23 Jul 2015 10:33:52 -0400

On Thu, Jul 23 2015 at  1:10am -0400,
Dave Chinner <david@xxxxxxxxxxxxx> wrote:

> On Wed, Jul 22, 2015 at 11:28:06AM -0500, Eric Sandeen wrote:
> > On 7/22/15 8:34 AM, Mike Snitzer wrote:
> > > On Tue, Jul 21 2015 at 10:37pm -0400,
> > > Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > >> On Tue, Jul 21, 2015 at 09:40:29PM -0400, Mike Snitzer wrote:
> > >>> I'm open to considering alternative interfaces for getting you the info
> > >>> you need.  I just don't have a great sense for what mechanism you'd like
> > >>> to use.  Do we invent a new block device operations table method that
> > >>> sets values in a 'struct no_space_strategy' passed in to the
> > >>> blockdevice?
> > >>
> > >> It's long been frowned on having the filesystems dig into block
> > >> device structures. We have lots of wrapper functions for getting
> > >> information from or performing operations on block devices. (e.g.
> > >> bdev_read_only(), bdev_get_queue(), blkdev_issue_flush(),
> > >> blkdev_issue_zeroout(), etc) and so I think this is the pattern we'd
> > >> need to follow. If we do that - bdev_get_nospace_strategy() - then
> > >> how that information gets to the filesystem is completely opaque
> > >> at the fs level, and the block layer can implement it in whatever
> > >> way is considered sane...
> > >>
> > >> And, realistically, all we really need returned is a enum to tell us
> > >> how the bdev behaves on enospc:
> > >> 	- bdev fails fast, (i.e. immediate ENOSPC)
> > >> 	- bdev fails slow, (i.e. queue for some time, then ENOSPC)
> > >> 	- bdev never fails (i.e. queue forever)
> > >> 	- bdev doesn't support this (i.e. EOPNOTSUPP)
> > 
> > I'm not sure how this is more useful than the bdev simply responding to
> > a query of "should we keep trying IOs?"
> 
> 	- bdev fails fast, (i.e. immediate ENOSPC)
> 
> XFS should use a bound retry behaviour for to allow the possiblity of
> the admin adding more space before we shut down the fs. i.e.
> XFS fails slow.
> 
> 	- bdev fails slow, (i.e. queue for some time, then ENOSPC)
> 
> We know that IOs are going to be delayed before they are failed, so
> there's no point in retrying as the admin has already had a chance
> to resolve the ENOSPC condition before failure was reported. i.e.
> XFS fails fast.
> 
> 	- bdev never fails (i.e. queue forever)
> 
> Block device will appear to hang when it runs out of space. Nothing
> XFS can do here because IOs never fail, but we need to note this in
> the log at mount time so that filesystem hangs are easily explained
> when reported to us.
> 
> 	- bdev doesn't support this (i.e. EOPNOTSUPP)
> 
> XFS uses default "retry forever" behaviour.
> 
> > > This 'struct no_space_strategy' would be invented purely for
> > > informational purposes for upper layers' benefit -- I don't consider it
> > > a "block device structure" it the traditional sense.
> > > 
> > > I was thinking upper layers would like to know the actual timeout value
> > > for the "fails slow" case.  As such the 'struct no_space_strategy' would
> > > have the enum and the timeout.  And would be returned with a call:
> > >      bdev_get_nospace_strategy(bdev, &no_space_strategy)
> > 
> > Asking for the timeout value seems to add complexity.  It could change after
> > we ask, and knowing it now requires another layer to be handling timeouts...
> 
> I don't think knowing the bdev timeout is necessary because the
> default is most likely to be "fail fast" in this case. i.e. no
> retries, just shut down.  IOWs, if we describe the configs and
> actions in neutral terms, then the default configurations easy for
> users to understand. i.e:
> 
> bdev enospc		XFS default
> -----------		-----------
> Fail slow		Fail fast
> Fail fast		Fail slow
> Fail never		Fail never, Record in log
> EOPNOTSUPP		Fail never
> 
> With that in mind, I'm thinking I should drop the
> "permanent/transient" error classifications, and change it "failure
> behaviour" with the options "fast slow [never]" and only the slow
> option has retry/timeout configuration options.  I think the "never"
> option still needs to "fail at unmount" config variable, but we
> enable it by default rather than hanging unmount and requiring a
> manual shutdown like we do now....

This all sounds good to me.  The simpler XFS configuration looks like a
nice improvement.

If you just want to stub out the call to bdev_get_nospace_strategy() I
can crank through implementing it once I get a few minutes.

Btw, not sure what I was thinking when suggesting XFS would benefit from
knowing the duration of the thinp no_space_timeout.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html