Re: [PATCH 3/5] vfs: add a zero-initialization mode to fallocate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21/09/21 10:44AM, Dave Chinner wrote:
> On Fri, Sep 17, 2021 at 06:31:01PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> >
> > Add a new mode to fallocate to zero-initialize all the storage backing a
> > file.
> >
> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > ---
> >  fs/open.c                   |    5 +++++
> >  include/linux/falloc.h      |    1 +
> >  include/uapi/linux/falloc.h |    9 +++++++++
> >  3 files changed, 15 insertions(+)
> >
> >
> > diff --git a/fs/open.c b/fs/open.c
> > index daa324606a41..230220b8f67a 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -256,6 +256,11 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >  	    (mode & ~FALLOC_FL_INSERT_RANGE))
> >  		return -EINVAL;
> >
> > +	/* Zeroinit should only be used by itself and keep size must be set. */
> > +	if ((mode & FALLOC_FL_ZEROINIT_RANGE) &&
> > +	    (mode != (FALLOC_FL_ZEROINIT_RANGE | FALLOC_FL_KEEP_SIZE)))
> > +		return -EINVAL;
> > +
> >  	/* Unshare range should only be used with allocate mode. */
> >  	if ((mode & FALLOC_FL_UNSHARE_RANGE) &&
> >  	    (mode & ~(FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE)))
> > diff --git a/include/linux/falloc.h b/include/linux/falloc.h
> > index f3f0b97b1675..4597b416667b 100644
> > --- a/include/linux/falloc.h
> > +++ b/include/linux/falloc.h
> > @@ -29,6 +29,7 @@ struct space_resv {
> >  					 FALLOC_FL_PUNCH_HOLE |		\
> >  					 FALLOC_FL_COLLAPSE_RANGE |	\
> >  					 FALLOC_FL_ZERO_RANGE |		\
> > +					 FALLOC_FL_ZEROINIT_RANGE |	\
> >  					 FALLOC_FL_INSERT_RANGE |	\
> >  					 FALLOC_FL_UNSHARE_RANGE)
> >
> > diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
> > index 51398fa57f6c..8144403b6102 100644
> > --- a/include/uapi/linux/falloc.h
> > +++ b/include/uapi/linux/falloc.h
> > @@ -77,4 +77,13 @@
> >   */
> >  #define FALLOC_FL_UNSHARE_RANGE		0x40
> >
> > +/*
> > + * FALLOC_FL_ZEROINIT_RANGE is used to reinitialize storage backing a file by
> > + * writing zeros to it.  Subsequent read and writes should not fail due to any
> > + * previous media errors.  Blocks must be not be shared or require copy on
> > + * write.  Holes and unwritten extents are left untouched.  This mode must be
> > + * used with FALLOC_FL_KEEP_SIZE.
> > + */
> > +#define FALLOC_FL_ZEROINIT_RANGE	0x80
>
> Hmmmm.
>
> I think this wants to be a behavioural modifier for existing
> operations rather than an operation unto itself. i.e. similar to how
> KEEP_SIZE modifies ALLOC behaviour but doesn't fundamentally alter
> the guarantees ALLOC provides userspace.
>
> In this case, the change of behaviour over ZERO_RANGE is that we
> want physical zeros to be written instead of the filesystem
> optimising away the physical zeros by manipulating the layout
> of the file.
>
> There's been requests in the past for a way to make ALLOC also
> behave like this - in the case that users want fully allocated space
> to be preallocated so their applications don't take unwritten extent
> conversion penalties on first writes. Databases are an example here,
> where setup of a new WAL file isn't performance critical, but writes
> to the WAL are and the WAL files are write-once. Hence they always
> take unwritten conversion penalties and the only way around that is
> to physically zero the files before use...
>
> So it seems to me what we actually need here is a "write zeroes"
> modifier to fallocate() operations to tell the filesystem that the
> application really wants it to write zeroes over that range, not
> just guarantee space has been physically allocated....
>
> Then we have and API that looks like:
>
> 	ALLOC		- allocate space efficiently
> 	ALLOC | INIT	- allocate space by writing zeros to it
> 	ZERO		- zero data and preallocate space efficiently
> 	ZERO | INIT	- zero range by writing zeros to it
>
> Which seems to cater for all the cases I know of where physically
> writing zeros instead of allocating unwritten extents is the
> preferred behaviour of fallocate()....
>

If that's the case we can just have FALLOC_FL_ZEROWRITE_RANGE?
Where FALLOC_FL_ZERO_RANGE & FALLOC_FL_ZEROWRITE_RANGE are mutually exclusive.

AFAIU,
/* FALLOC_FL_ZERO_RANGE may optimize the underlying blocks with unwritten
 * extents if the filesystem allows so, but with FALLOC_FL_ZEROWRITE_RANGE,
 * the underlying blocks are guranteed to be written with zeros.
 * In case of hole it will be preallocated with written extents and will be
 * initialized with zeroes. If FALLOC_FL_KEEP_SIZE is specified then the
 * inode size will remain the same.
 *
 * Essentially similar to FALLOC_FL_ZERO_RANGE but with gurantees that
 * underlying storage has written extents initialized with zeroes.
 */
#define FALLOC_FL_ZEROWRITE_RANGE		0x80

Does that make sense?

-ritesh



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux