Re: Does your application depend on, or report, free disk space? Re: F20 Self Contained Change: OS Installer Support for LVM Thin Provisioning

Mike Snitzer <snitzer@xxxxxxxxxx> · Tue, 30 Jul 2013 23:34:17 -0400

On Mon, Jul 29 2013 at  2:49pm -0400,
Daniel P. Berrange <berrangeatredhat.com> wrote:

> On Mon, Jul 29, 2013 at 02:38:23PM -0400, Ric Wheeler wrote:
> > On 07/29/2013 10:18 AM, Daniel P. Berrange wrote:
> > >On Mon, Jul 29, 2013 at 08:01:23AM -0600, Chris Murphy wrote:
> > >>On Jul 29, 2013, at 6:30 AM, "Daniel P. Berrange" <berrange at redhat.com> wrote:
> > >>
> > >>>Yep, we need to be able to report free space on filesystems, so that
> > >>>apps provisioning virtual machines can get an idea of how much storage
> > >>>they can provide to VMs without risk of over comitting.
> > >>>
> > >>>I agree that we really want the kernel, or at least a reusable shared
> > >>>library, to provide some kind of interface to determine this, rather
> > >>>than requiring every userspace app which cares to re-invent the wheel.
> > >>What does it mean for an app to use stat to get free space, and then
> > >>proceeds to create too big a VM image in a directory that has a quota
> > >>set? I still think apps are asking an inappropriate/unqualified question
> > >>by asking for volume free space, instead of what's available to them for
> > >>a specified path.
> > > From an API POV, libvirt doesn't need/care about the free space on the
> > >volume underlying the filesystem. We actually only care about the free
> > >space in a given directory that we're using for disk images. It just
> > >happens that we implement this using statvfs() currently. So when I
> > >ask for an API above, don't take this to mean I want a statvfs() that
> > >knows about sparse volumes. An API or syscall that provides free space
> > >for individual directories is fine with me.
> > >
> >
> > Just another note, it is never safe to assume that storage under any
> > file system is yours for the taking.
> > 
> > If application A does a stat or statvfs() call, sees 1GB of space
> > left and then does a write, we could easily lose that race to any
> > other application.
> 
> This race doesn't matter from libvirt's POV. It is just providing a
> mechanism via its API. It is upto the management application using
> libvirt to make use of the mechanism to provide a usage policy.
> Their usage scenario may well enable them to make certain assumptions
> about the storage that you could not otherwise do in a race free
> manner.
> 
> In addition, even in more general purpose usage scenarios, it does
> not neccessarily matter if there is a race, because there can be a
> second line of defence. For example, KVM can be set to pause the VM
> upon ENOSPC errors, giving management application or administrator
> the chance to expand capacity the underlying storage and then unpause
> the guest. In that case checking the free space is mostly just a
> sanity check which serves to avoid hitting the pause-on-ENOSPC scenario
> too frequently.

Running out of free space _should_ be extremely rare.  A properly
configured dm-thin pool will have adequate free space, with an
appropriate low water mark, that would give admins ample time to extend
(even if a human were to do it).  But lvm2 has support to autoextend the
thin-pool with free space in the parent volume group.

But I'm just talking about the not-really-chicken solution of leaning on
a properly configured system (either by admins in a data center or by
fedora developers with sane defaults).

As an aside, this extra free space checking that KVM is doing is really
broken by design (polling sucks -- especially if this polling is
happening in the host for each guest).  Would be much better to leverage
something like lvm2 with a custom dmeventd plugin that fires when it
receives the low watermark and/or -ENOSPC event.

Thinly provisioned volumes offer the prospect of doing away with this
polling -- as such proper dm-thin integration has been on the virt
roadmap for a while.  Just never seems to happen.

Mike
-- 
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct