Hi Gents, On Mon, Sep 30, 2013 at 05:24:54PM -0500, Eric Sandeen wrote: > On 9/29/13 6:37 PM, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > Michael L Semon reported that generic/069 runtime increased on v5 > > superblocks by 100% compared to v4 superblocks. his perf-based > > analysis pointed directly at the timestamp updates being done by the > > write path in this workload. The append writers are doing 4-byte > > writes, so there are lots of timestamp updates occurring. > > > > The thing is, they aren't being triggered by timestamp changes - > > they are being triggered by the inode change counter needing to be > > updated. That is, every write(2) system call needs to bump the inode > > version count, and it does that through the timestamp update > > mechanism. Hence for v5 filesystems, test generic/069 is running 3 > > orders of magnitude more timestmap update transactions on v5 > > filesystems due to the fact it does a huge number of *4 byte* > > write(2) calls. > > > > This isn't a real world scenario we really need to address - anyone > > doing such sequential IO should be using fwrite(3), not write(2). > > i.e. fwrite(3) buffers the writes in userspace to minimise the > > number of write(2) syscalls, and the problem goes away. > > > > However, there is a small change we can make to improve the > > situation - removing the expensive lock operation on the change > > counter update. All inode version counter changes in XFS occur > > under the ip->i_ilock during a transaction, and therefore we > > don't actually need the spin lock that provides exclusive access to > > it through inc_inode_iversion(). > > > > Hence avoid the lock and just open code the increment ourselves when > > logging the inode. > > > > Reported-by: Michael L. Semon <mlsemon35@xxxxxxxxx> > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > --- > > fs/xfs/xfs_trans_inode.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c > > index 53dfe46..e6601c1 100644 > > --- a/fs/xfs/xfs_trans_inode.c > > +++ b/fs/xfs/xfs_trans_inode.c > > @@ -118,8 +118,7 @@ xfs_trans_log_inode( > > */ > > if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) && > > IS_I_VERSION(VFS_I(ip))) { > > - inode_inc_iversion(VFS_I(ip)); > > - ip->i_d.di_changecount = VFS_I(ip)->i_version; > > comment about the reason for the open-code might be good, too? > > otherwise some semantic patcher might "fix" it for you again later... > > -Eric > > > + ip->i_d.di_changecount = ++VFS_I(ip)->i_version; > > flags |= XFS_ILOG_CORE; > > } > > > > Adding a comment strikes me as a good idea too... But isn't that lock there for a reason? I suspect that will break i_version like i_size on 32 bit systems. Jean added this function, hopefully he can shed some light. Thanks, Ben _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs