On Mon, 2010-10-04 at 21:13 +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > A long standing problem for streaming writeÑ through the NFS server > has been that the NFS server opens and closes file descriptors on an > inode for every write. The result of this behaviour is that the > ->release() function is called on every close and that results in > XFS truncating speculative preallocation beyond the EOF. This has > an adverse effect on file layout when multiple files are being > written at the same time - they interleave their extents and can > result in severe fragmentation. > > To avoid this problem, keep a count of the number of ->release calls > made on an inode. For most cases, an inode is only going to be opened > once for writing and then closed again during it's lifetime in > cache. Hence if there are multiple ->release calls, there is a good > chance that the inode is being accessed by the NFS server. Hence > count up every time ->release is called while there are delalloc > blocks still outstanding on the inode. > > If this count is non-zero when ->release is next called, then do no > truncate away the speculative preallocation - leave it there so that > subsequent writes do not need to reallocate the delalloc space. This > will prevent interleaving of extents of different inodes written > concurrently to the same AG. > > If we get this wrong, it is not a big deal as we truncate > speculative allocation beyond EOF anyway in xfs_inactive() when the > inode is thrown out of the cache. > > The new counter in the struct xfs_inode fits into a hole in the > structure on 64 bit machines, so does not grow the size of the inode > at all. This seems reasonable, and I have no real objection to it. However, I have a question and a comment related to the affected code (and not your specific change). > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > fs/xfs/xfs_inode.h | 1 + > fs/xfs/xfs_vnodeops.c | 15 ++++++++++++++- > 2 files changed, 15 insertions(+), 1 deletions(-) > > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h > index 1594190..82aad5e 100644 > --- a/fs/xfs/xfs_inode.h > +++ b/fs/xfs/xfs_inode.h > @@ -261,6 +261,7 @@ typedef struct xfs_inode { > xfs_fsize_t i_size; /* in-memory size */ > xfs_fsize_t i_new_size; /* size when write completes */ > atomic_t i_iocount; /* outstanding I/O count */ > + int i_dirty_releases; /* dirty ->release calls */ > > /* VFS inode */ > struct inode i_vnode; /* embedded VFS inode */ > diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c > index b7bdc43..0c8eeba 100644 > --- a/fs/xfs/xfs_vnodeops.c > +++ b/fs/xfs/xfs_vnodeops.c OK, this comment is unrelated to your exact change. But just above the next hunk there's a big nasty condition, which appears to be *almost* duplicated in xfs_inactive() (twice!). It would be very nice if, while you're at modifying this nearby code, you could encapsulate that condition in a macro that has a meaningful name. > @@ -979,14 +979,27 @@ xfs_release( > * chance to drop them once the last reference to > * the inode is dropped, so we'll never leak blocks > * permanently. I'm curious what the effect is if we simply don't do the truncate *except* when the inode becomes inactive. It means we hang onto the stuff for a while longer, and maybe it makes things messier in the event of a crash. Can you tell me why we do the truncate here as well as in xfs_inactive() (or what the problem is of *not* doing it here)? > + * > + * Further, count the number of times we get here in > + * the life of this inode. If the inode is being > + * opened, written and closed frequently and we have > + * delayed allocation blocks oustanding (e.g. streaming > + * writes from the NFS server), truncating the > + * blocks past EOF will cause fragmentation to occur. > + * In this case don't do the truncation, either. > */ > + if (ip->i_delayed_blks) > + ip->i_dirty_releases++; > + if (ip->i_dirty_releases > 1) > + goto out; > + > error = xfs_free_eofblocks(mp, ip, > XFS_FREE_EOF_TRYLOCK); > if (error) > return error; > } > } > - > +out: > return 0; > } > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs