Re: [RFC PATCH 0/14] xfs: Towards thin provisioning aware filesystems

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 6 Nov 2017 10:51:04 +1100

On Fri, Nov 03, 2017 at 07:26:27AM -0400, Brian Foster wrote:
> On Fri, Nov 03, 2017 at 10:30:17AM +1100, Dave Chinner wrote:
> > On Thu, Nov 02, 2017 at 07:25:33AM -0400, Brian Foster wrote:
> > > On Thu, Nov 02, 2017 at 10:53:00AM +1100, Dave Chinner wrote:
> > > > On Wed, Nov 01, 2017 at 10:17:21AM -0400, Brian Foster wrote:
> > > > > On Wed, Nov 01, 2017 at 11:45:13AM +1100, Dave Chinner wrote:
> > > > > > On Tue, Oct 31, 2017 at 07:24:32AM -0400, Brian Foster wrote:
> > > > > > > On Tue, Oct 31, 2017 at 08:09:41AM +1100, Dave Chinner wrote:
> > > > > > > > On Mon, Oct 30, 2017 at 09:31:17AM -0400, Brian Foster wrote:
> > > > > > > > > On Thu, Oct 26, 2017 at 07:33:08PM +1100, Dave Chinner wrote:
> ...
> > > > > BTW, was there ever any kind of solution to the metadata block
> > > > > reservation issue in the thin case? We now hide metadata reservation
> > > > > from the user via the m_usable_blocks account. If m_phys_blocks
> > > > > represents a thin volume, how exactly do we prevent those metadata
> > > > > allocations/writes from overrunning what the admin has specified as
> > > > > "usable" with respect to the thin volume?
> > > > 
> > > > The reserved metadata blocks are not accounted from free space when
> > > > they are allocated - they are pulled from the reserved space that
> > > > has already been removed from the free space.
> > > > 
> > > 
> > > Ok, so the user can set a usable blocks value of something less than the
> > > fs geometry, then the reservation is pulled from that, reducing the
> > > reported "usable" value further. Hence, what ends up reported to the
> > > user is actually something less than the value set by the user, which
> > > means that the filesystem overall respects how much space the admin says
> > > it can use in the underlying volume.
> > > 
> > > For example, the user creates a 100T thin volume with 10T of usable
> > > space. The fs reserves a further 2T out of that for metadata, so then
> > > what the user sees is 8T of writeable space.  The filesystem itself
> > > cannot use more than 10T out of the volume, as instructed. Am I
> > > following that correctly? If so, that sounds reasonable to me from the
> > > "don't overflow my thin volume" perspective.
> > 
> > No, that's not what happens. For thick filesystems, the 100TB volume
> > gets 2TB pulled from it so it appears as a 98TB filesystem. This is
> > done by modifying the free block counts and m_usable_space when the
> > reservations are made.
> > 
> 
> Ok..
> 
> > For thin filesystems, we've already got 90TB of space "reserved",
> > and so the metadata reservations and allocations come from that.
> > i.e. we skip the modification of free block counts and m_usable
> > space in the case of a thinspace filesystem, and so the user still
> > sees 10TB of usable space that they asked to have.
> > 
> 
> Hmm.. so then I'm slightly confused regarding the thin use case
> regarding prevention of pool depletion. The usable blocks value that the
> user settles on is likely based on how much space the filesystem should
> use to safely avoid pool depletion.

I did say up front that the user data thinspace accounting would not
be an exact reflection of underlying storage pool usage. Things like
partially written blocks in the underlying storage pool mean write
amplification factors would need to be considered, but that's
something the admin already has to deal with in thinly provisioned
storage.

> If a usable value of 10T means the
> filesystem can write to the usable 10T + some amount of metadata
> reservation, how does the user determine a sane usable value based on
> the current pool geometry?

>From an admin POV it's damn easy to document in admin guides that
actual space usage of a thinspace filesysetm is going to be in the
order of 2% greater than the space given to the filesystem for user
data. Use an overhead of 2-5% for internal management and the "small
amount of extra space for internal metadata" issue can be ignored.

> > > The best I can read into the response here is that you think physical
> > > shrink is unlikely enough to not need to care very much what kind of
> > > interface confusion could result from needing to rev the current growfs
> > > interface to support physical shrink on thin filesystems in the future.
> > > Is that a fair assessment..?
> > 
> > Not really. I understand just how complex a physical shrink
> > implementation is going to be, and have a fair idea of the sorts of
> > craziness we'll need to add to xfs_growfs to support/co-ordinate a
> > physical shrink operation.  From that perspective, I don't see a
> > physical shrink working with an unchanged growfs interface. The
> > discussion about whether or not we should physically shrink
> > thinspace filesystems is almost completely irrelevant to the
> > interface requirements of a physical shrink....
> 
> So it's not so much about the likelihood of realizing physical shrink,
> but rather the likelihood that physical shrink would require to rev the
> growfs structure anyways (regardless of this feature).

Yup, pretty much.

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html