Re: [PATCH 2/2] xfs: remove the m_active_trans counter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 20, 2020 at 01:47:54PM -0700, Darrick J. Wong wrote:
> On Wed, May 20, 2020 at 08:23:10AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > It's a global atomic counter, and we are hitting it at a rate of
> > half a million transactions a second, so it's bouncing the counter
> > cacheline all over the place on large machines. We don't actually
> > need it anymore - it used to be required because the VFS freeze code
> > could not track/prevent filesystem transactions that were running,
> > but that problem no longer exists.
> > 
> > Hence to remove the counter, we simply have to ensure that nothing
> > calls xfs_sync_sb() while we are trying to quiesce the filesytem.
> > That only happens if the log worker is still running when we call
> > xfs_quiesce_attr(). The log worker is cancelled at the end of
> > xfs_quiesce_attr() by calling xfs_log_quiesce(), so just call it
> > early here and then we can remove the counter altogether.
> > 
> > Concurrent create, 50 million inodes, identical 16p/16GB virtual
> > machines on different physical hosts. Machine A has twice the CPU
> > cores per socket of machine B:
> > 
> > 		unpatched	patched
> > machine A:	3m16s		2m00s
> > machine B:	4m04s		4m05s
> > 
> > Create rates:
> > 		unpatched	patched
> > machine A:	282k+/-31k	468k+/-21k
> > machine B:	231k+/-8k	233k+/-11k
> > 
> > Concurrent rm of same 50 million inodes:
> > 
> > 		unpatched	patched
> > machine A:	6m42s		2m33s
> > machine B:	4m47s		4m47s
> > 
> > The transaction rate on the fast machine went from just under
> > 300k/sec to 700k/sec, which indicates just how much of a bottleneck
> > this atomic counter was.
> > 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> /me kinda wonders why removing the counter entirely has so little effect
> on machine B, but seeing as I've been pondering killing this counter
> myself for years,

Because the transaction rate on machine B isn't high enough to that
the cacheline bouncing becomes a limiting factor.

Don't forget that the impact of cacheline bouncing is exponential -
there is a very small window where it goes from "none at all" to
"all the machine", and these two machines sit on either side of that
threshold. i.e. the older machine is too slow to hit that threshold,
the newer machine hits it easily.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux