Re: Making discard/fstrim reliable

Lukáš Czerner <lczerner@xxxxxxxxxx> · Thu, 3 Apr 2014 19:08:34 +0200 (CEST)

On Wed, 26 Mar 2014, Richard W.M. Jones wrote:

> Date: Wed, 26 Mar 2014 20:47:08 +0000
> From: Richard W.M. Jones <rjones@xxxxxxxxxx>
> To: linux-fsdevel@xxxxxxxxxxxxxxx
> Cc: pbonzini@xxxxxxxxxx
> Subject: Making discard/fstrim reliable
> 
> 
> virt-sparsify is a tool for trimming free space in virtual disk
> images.  The new implementation uses vfs/kernel/qemu discard support.
> Essentially it does:
> 
>   for each filesystem:
>     mount -o discard $fs /mnt
>     sync
>     fstrim /mnt
>     umount /mnt
>   sync
>   # qemu is killed after sync returns
> 
> Although typing these commands by hand works fine, when you run them
> from a program the fstrim doesn't happen all the way down the stack
> reliably.  Mostly it works, but sometimes it only trims some space
> from the host file.
> 
> It appears that when the host is slow / under load, the problem
> happens more frequently.  Also it may happen more frequently on i686
> than on x86-64 (possibly also due to speed of host).
> 
> The question is: What can I do to make sure the trim happens reliably,
> all the way down the stack, before qemu is killed?
> 
> I am testing this using the latest upstream kernel & qemu.
> 
> Rich.

There is really no reliability to be had with discard. It's and
advisory interface, not every file system implements it and when it
does the implementation and hence the results varies wildly.

I'd suggest not to do things this way.

However let's take a look at your case. In order to determine why you
think it's unreliable I'd need some data to back it up. How the file
system looks like (an image would be great), when and how it was
created, what is its size, what's the image size and what size
difference do you expect. Also what file system type this is.

However if we're talking about raw file system images in files in
the host, then much better solution would be to use fsck. Ext4
already has option -E discard which will send a discard down for
ever free range (similarly as fstrim would do on mounted file
system). I suspect that other fs utilities might have similar
functionality.

Of course in order for it to work you need a layer to translate
discard requests to punch holes to the underlying file system (such
as loop device for example). But I think that if there is enough
interest we might do this directly from e2fsck when we notice that
we're running on the file rather than block device.

Also please note that mke2fs will issue the initial discard by
default, so if you create the file system and then run fstrim on it
with expectation that the size of a backing file will go down, you
would be wrong. It was already trimmed down on file system creation
time.

All that said, while discard is a interesting functionality and can
be abused in many _many_ ways. It looks like what you really need is
something that is currently available in fallocate(1) from
util-linux package. The option to look for is --dig-holes:

-d, --dig-holes
      Detect  and  dig holes. Makes the file sparse in-place, without using extra disk space.
      The minimal size of the hole depends on filesystem I/O block size (usually 4096 bytes).
      Also, when using this option, --keep-size is implied.

      You can think of this as doing a "cp --sparse" and renaming the dest file as the origi‐
      nal, without the need for extra disk space.

I am not sure whether util-linux version with this functionality has
been released yet. But you can always checkout git repository:

https://github.com/karelzak/util-linux.git

I hope it helps.

Thanks!
-Lukas