Re: [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems

Brian Foster <bfoster@xxxxxxxxxx> · Fri, 3 Nov 2017 07:26:27 -0400

On Fri, Nov 03, 2017 at 10:30:17AM +1100, Dave Chinner wrote:
> On Thu, Nov 02, 2017 at 07:25:33AM -0400, Brian Foster wrote:
> > On Thu, Nov 02, 2017 at 10:53:00AM +1100, Dave Chinner wrote:
> > > On Wed, Nov 01, 2017 at 10:17:21AM -0400, Brian Foster wrote:
> > > > On Wed, Nov 01, 2017 at 11:45:13AM +1100, Dave Chinner wrote:
> > > > > On Tue, Oct 31, 2017 at 07:24:32AM -0400, Brian Foster wrote:
> > > > > > On Tue, Oct 31, 2017 at 08:09:41AM +1100, Dave Chinner wrote:
> > > > > > > On Mon, Oct 30, 2017 at 09:31:17AM -0400, Brian Foster wrote:
> > > > > > > > On Thu, Oct 26, 2017 at 07:33:08PM +1100, Dave Chinner wrote:
...
> > > > BTW, was there ever any kind of solution to the metadata block
> > > > reservation issue in the thin case? We now hide metadata reservation
> > > > from the user via the m_usable_blocks account. If m_phys_blocks
> > > > represents a thin volume, how exactly do we prevent those metadata
> > > > allocations/writes from overrunning what the admin has specified as
> > > > "usable" with respect to the thin volume?
> > > 
> > > The reserved metadata blocks are not accounted from free space when
> > > they are allocated - they are pulled from the reserved space that
> > > has already been removed from the free space.
> > > 
> > 
> > Ok, so the user can set a usable blocks value of something less than the
> > fs geometry, then the reservation is pulled from that, reducing the
> > reported "usable" value further. Hence, what ends up reported to the
> > user is actually something less than the value set by the user, which
> > means that the filesystem overall respects how much space the admin says
> > it can use in the underlying volume.
> > 
> > For example, the user creates a 100T thin volume with 10T of usable
> > space. The fs reserves a further 2T out of that for metadata, so then
> > what the user sees is 8T of writeable space.  The filesystem itself
> > cannot use more than 10T out of the volume, as instructed. Am I
> > following that correctly? If so, that sounds reasonable to me from the
> > "don't overflow my thin volume" perspective.
> 
> No, that's not what happens. For thick filesystems, the 100TB volume
> gets 2TB pulled from it so it appears as a 98TB filesystem. This is
> done by modifying the free block counts and m_usable_space when the
> reservations are made.
> 

Ok..

> For thin filesystems, we've already got 90TB of space "reserved",
> and so the metadata reservations and allocations come from that.
> i.e. we skip the modification of free block counts and m_usable
> space in the case of a thinspace filesystem, and so the user still
> sees 10TB of usable space that they asked to have.
> 

Hmm.. so then I'm slightly confused regarding the thin use case
regarding prevention of pool depletion. The usable blocks value that the
user settles on is likely based on how much space the filesystem should
use to safely avoid pool depletion. If a usable value of 10T means the
filesystem can write to the usable 10T + some amount of metadata
reservation, how does the user determine a sane usable value based on
the current pool geometry?

> > > i.e. we don't need to rev the interface to support shrink on thin
> > > filesystems, so there's no need to rev the interface at this point
> > > in time.
> > > 
> > > *If* we implement physical shrink, *then* we can rev the growfs
> > > interface to allow users to run a physical shrink on thin
> > > filesystems.
> > > 
> > 
> > Subsequently, pretty much the remainder of my last mail is based on the
> > following predicates:
> > 
> > - We've incorporated this change to use growfs->newblocks for logical
> >   shrink.
> > - We've implemented physical shrink.
> > - We've revved the growfs interface to support physical shrink on thin
> >   filesystems.
> > 
> > I'm not going to repeat all of the previous points... suffice it to say
> > that asserting that we only have to rev the interface if/when we support
> > physical shrink in response is a circular argument.
> 
> No, it's not.
> 
> History tells us that every time someone does this sort of "future
> planning" for a fixed ABI interface they don't get it right. Hence
> when that new functionality comes along the interface has to be
> changed /again/ to fix whatever previous assumptions or requirements
> were found to be invalid when development of the feature was
> actually done.
> 
> The syscall and ioctl APIs are littered with examples and we should
> pay heed to the lessons those mistakes teach us. i.e. don't
> speculate about future API requirements for functionality that may
> not exist or may change substantially from what it is envisiaged to
> need right now.
> 

Sure, that makes sense...

> > I understand that
> > and I agree. I'm attempting to review how that would look due to the
> > implementation of this feature, particularly with respect to backwards
> > compatibility of the existing interface.
> > 
> > IOW, you're using the argument that we can rev the growfs interface in
> > response to the initial argument regarding the the inability to
> > physically shrink a thin fs.
> 
> Yes, because when we get physical shrink it is likely that the
> growfs interface is going to need more than just a new set of flags.
> 
> Off the top of my head, we're going to new user-kernel co-ordination
> requirements such as read-only AGs while we move data out of them in
> a failsafe manner, interleave this with a whole new set of new
> "cannot perform operation" cases, handle new partial failure cases,
> restarting an interrupted shrink, and maybe even a need to ignore
> partial state to complete a previously failed shrink.
> 
> And there's even the possibility that we're going to have to limit
> physical shrink to rmap enabled filesystems (e.g. how do you find the
> owner of that bmbt block that is pinning the AG we need to remove
> without rmap?). At whcih point, a user might want to "force shrink"
> and then take the filesystem down and run repair....
> 
> IOWs, to say all we are going to need from the growfs ioctl for a
> physical shrink interface is a flag to to tell thin filesystems to
> do a physical shrink is assuming an awful lot about the physical
> shrinking implementation and how users will want to interact with
> the operation. Until we've done all the design work and are well
> into the implementation, we're not going to have a clue about what
> we actually need in way of grwofs interface changes for the physical
> shrink.
> 
> As such, trying to determine a future proof growfs interface change
> right now is simply a waste of time because we're not going to get
> it right.
> 

I'm not sure where the flags proposal thing came from, but regardless...
I'm not trying to future proof/design a shrink interface. Rather, just
reviewing the (use of the) interface as it already exists. I suppose I
am kind of assuming that the current interface would enable some form of
functionality in that hypothetical future world where physical shrink
exists. Perhaps that is not so realistic, however, as you suggest.

I'm not totally convinced of that that based on a dump of the challenges
the implementation would have to deal with, but I'm also not spending
much time thinking about the design/implementation of shrink. So I think
that's a reasonable enough argument to allow this feature to creep in on
the interface.

> > As a result, I'm claiming that revving the
> > interface in the future for physical shrink may create more interface
> > clumsiness than it's worth compared to just revving it now for logical
> > shrink. In response to the points I attempt to make around that, you
> > argue above that we aren't any closer to physical shrink than we were 10
> > years ago and that we don't have to rev the interface unless we support
> > physical shrink. Round and round... ;P
> > 
> > The best I can read into the response here is that you think physical
> > shrink is unlikely enough to not need to care very much what kind of
> > interface confusion could result from needing to rev the current growfs
> > interface to support physical shrink on thin filesystems in the future.
> > Is that a fair assessment..?
> 
> Not really. I understand just how complex a physical shrink
> implementation is going to be, and have a fair idea of the sorts of
> craziness we'll need to add to xfs_growfs to support/co-ordinate a
> physical shrink operation.  From that perspective, I don't see a
> physical shrink working with an unchanged growfs interface. The
> discussion about whether or not we should physically shrink
> thinspace filesystems is almost completely irrelevant to the
> interface requirements of a physical shrink....
> 

So it's not so much about the likelihood of realizing physical shrink,
but rather the likelihood that physical shrink would require to rev the
growfs structure anyways (regardless of this feature).

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html