On Mon, Jun 11, 2012 at 02:29:48PM -0500, Ben Myers wrote: > Testing has shown that there is a regression when using extent size > hints with NFS. There are many more extents created than expected. > > This is the workload, from a single NFS client: > # for e in `seq 1 8`; > do > mkdir -p $e; (cd $e; dd if=/dev/zero of=file bs=4096k count=100) & > done > > # xfs_io -c extsize . > [16777216] . > > Here is the extent count for each file without this patch: > > 385 387 387 387 400 398 379 334 So it is a 400MB file with roughly 400 1MB extents? IOWs, you're getting one extent per 1MB NFS write over the wire? Why aren't the extents being allocated adjacently? The allocator is supposed to taka the last block of the previous adjacent extent as the hint for the location of the new extent. This result implies that either the allocator is broken or you are crossing the streams. How many AGs do you have in your filesystem? I just ran that test with a filesystem that has more than 8 AGs, and the result it a single extent per file. When I have less than 8 AGs, there is contention at the AG level for allocation, and the result is allocation interleaving giving the above fragmentation levels. So the allocator is working just fine, it's just that your configuration is causing the streams to interleave, just like delalloc used to before the speculative prealloc beyond EOF changes in 2.6.38. (i.e. the I_DIRTY_RELEASE flag) > In a comparison of traces, one with the above workload run locally and > the other with the above workload over NFS, there were occasional calls > to xfs_free_eofblocks in NFS but not on local workloads. These turned > out to be from xfs_release. The reason that xfs_release() is doing this for NFS is that the I_DIRTY_RELEASE mitigation code triggers off delayed allocation blocks on the inode, and preallocation does not result in delalloc being present. > Not calling xfs_free_eofblocks when using the extent size hint resolves this > issue when using extsize hints and NFS. I suspect that delayed allocation > mitigated the effect of xfs_free_eofblocks until commit aff3a9ed. No, the I_DIRTY_RELEASE code is what mitigated the effect for delayed allocation. Using preallocation for extsize hints does not result in delalloc being present, hence this stopped working for such files. So this could be fixed simply by checking for dirty pages whenteh extentsize hint is set after eof truncation, just like we do for delalloc. i.e: if (ip->i_delayed_blks || ((ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE) && mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY))) xfs_iflags_set(ip, XFS_IDIRTY_RELEASE); Alternatively, (and I think a better solution) is to treat files with extent size hints on them exactly the same as files with preallocation on them. i.e. in xfs_set_diflags() we set XFS_DIFLAG_PREALLOC at the same time we set the XFS_DIFLAG_EXTSIZE, and then all our truncate logic just works as expected for physical preallocation beyond EOF on extsize hint files without changing anything else.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs