On Sat, Aug 05, 2017 at 09:46:15AM +1000, Dave Chinner wrote: > On Fri, Aug 04, 2017 at 01:33:12PM -0700, Darrick J. Wong wrote: > > On Thu, Aug 03, 2017 at 07:28:30PM -0700, Dan Williams wrote: > > > Add an on-disk inode flag to record the state of the S_IOMAP_IMMUTABLE > > > in-memory vfs inode flags. This allows the protections against reflink > > > and hole punch to be automatically restored on a sub-sequent boot when > > > the in-memory inode is established. > > > > > > The FS_XFLAG_IOMAP_IMMUTABLE is introduced to allow xfs_io to read the > > > state of the flag, but toggling the flag requires going through > > > fallocate(FALLOC_FL_[UN]SEAL_BLOCK_MAP). Support for toggling this > > > on-disk state is saved for a later patch. > > > > > > Cc: Jan Kara <jack@xxxxxxx> > > > Cc: Jeff Moyer <jmoyer@xxxxxxxxxx> > > > Cc: Christoph Hellwig <hch@xxxxxx> > > > Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> > > > Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx> > > > Suggested-by: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx> > > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > > --- > > > fs/xfs/libxfs/xfs_format.h | 5 ++++- > > > fs/xfs/xfs_inode.c | 2 ++ > > > fs/xfs/xfs_ioctl.c | 1 + > > > fs/xfs/xfs_iops.c | 8 +++++--- > > > include/uapi/linux/fs.h | 1 + > > > 5 files changed, 13 insertions(+), 4 deletions(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h > > > index d4d9bef20c3a..9e720e55776b 100644 > > > --- a/fs/xfs/libxfs/xfs_format.h > > > +++ b/fs/xfs/libxfs/xfs_format.h > > > @@ -1063,12 +1063,15 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev) > > > #define XFS_DIFLAG2_DAX_BIT 0 /* use DAX for this inode */ > > > #define XFS_DIFLAG2_REFLINK_BIT 1 /* file's blocks may be shared */ > > > #define XFS_DIFLAG2_COWEXTSIZE_BIT 2 /* copy on write extent size hint */ > > > +#define XFS_DIFLAG2_IOMAP_IMMUTABLE_BIT 3 /* set S_IOMAP_IMMUTABLE for this inode */ > > > > So... the greedy part of my brain that doesn't want to give out flags2 > > bits has been wondering, > > FWIW, I made di_flags2 a 64 bit value in the first place precisely > so we didn't have a scarcity problem and can just give out flag > bits for enabling new functionality like this... Ok. That's what I thought. > > what if we just didn't have an on-disk > > IOMAP_IMMUTABLE bit, and set FS_XFLAG based only on the in-core > > S_IOMAP_IMMUTABLE bit? If a program wants the immutable iomap > > semantics, they will have to code some variant on the following: > > > > fd = open(...); > > ret = fallocate(fd, FALLOC_FL_SEAL_BLOCK_MAP, 0, len...) > > if (ret) { > > printf("couldn't seal block map"); > > close(fd); > > return; > > } > > > > mmap(fd...); > > /* do sensitive io operations here */ > > munmap(fd...); > > > > close(fd); > > > > Therefore the cost of not having the on-disk flag is that we'll have to > > do more unshare/alloc/test/set cycles than we would if we could remember > > the iomap-immutable state across unmounts and inode reclaiming. > > However, if the data map is already ready to go, this shouldn't have a > > lot of overhead since we only have to iterate the in-core extents. > > > > Just trying to make sure we /need/ the inode flag bit. :) > > IMO, fallocate() is for making permanent changes to file extents. If > this is not going to be a permanent state change but only a > runtime-while-the-inode-is-in-cache flag, then it's probably not the > right interface to use. > > This also seems problematic for applications other than DAX where > the block map may be sealed, the fd closed and access handed off to > another entity for remote storage access. If the inode gets > reclaimed due to memory pressure, the system loses the fact that > that the inode has been sealed. Hence another process can come > along, re-read the inode and modify the block map because it hasn't > been sealed in this new cache life cycle..... <nod> Ok, I'm convinced. :) --D > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html