Re: [PATCH RFC] fs: add FIEMAP_FLAG_DISCARD support

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 5 Nov 2013 15:36:25 +1100

On Mon, Nov 04, 2013 at 07:51:46PM -0500, Theodore Ts'o wrote:
> On Mon, Nov 04, 2013 at 02:03:43AM -0800, Christoph Hellwig wrote:
> > 
> > Besides that I really miss an explanation what the intended use cases
> > are.  What does this buy us over punching a hole on an actual real
> > workload?  Where is the overhead?  Is it our shitty discard
> > implementation? If so there's tons of low hanging fruit to fix there
> > anyway that we shouldn't work around by interfaces taking shortcuts.
> > Is it problems in ext4's extent management on hole punch?  Is the
> > bit of metadata created when doing an actual hole punch too much for
> > that very specific workload?
> 
> The an application in question wants to treat a large file as if it
> were a block device --- that's hardly unprecedented; enterprise
> databases tend to prefer using raw block devices (at least for
> benchmarking purposes), but system administrators like to
> administrative convenience of using a file system.
> 
> The goal here is get the performace as close to a raw block device as
> possible.  Especially if you are using fast flash, the overhead of
> deallocating blocks using punch, only to reallocate the blocks when we
> later write into them, is just unnecessary overhead.  Also, if you
> deallocate the blocks, they could end up getting grabbed by some other
> block allocation, which means the file can end up getting very
> fragmented --- which doesn't matter that much for flash, I suppose,
> but it means the extent tree could end up growing and getting nasty
> over time.  The bottom line is why bother doing extra work when it's
> not necessary?

commit 447223520520b17d3b6d0631aa4838fbaf8eddb4
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Aug 24 12:02:11 2010 +1000

    xfs: Introduce XFS_IOC_ZERO_RANGE

    XFS_IOC_ZERO_RANGE is the equivalent of an atomic XFS_IOC_UNRESVSP/
    XFS_IOC_RESVSP call pair. It enabled ranges of written data to be
    turned into zeroes without requiring IO or having to free and
    reallocate the extents in the range given as would occur if we had
    to punch and then preallocate them separately.  This enables
    applications to zero parts of files very quickly without changing
    the layout of the files in any way.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

Sounds pretty much like the same reasons to me. IOWs, we did this
more than 3 years ago on XFS but discard was not a requirement for
the cloudy guy who asked for it so we simply didn't implement it.

I agree that per-file discard is useful, but it needs to have well
defined byte-range semantics and prevent stale data exposure to
unprivileged users. FALLOC_FL_ZERO_RANGE has those semantics, while
FALLOC_FL_ZERO_RANGE|FALLOC_FL_NO_HIDE_STALE gives google the
trigger to issue discards without caring about what you get back
from the discards....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html