Re: [PATCH RFC fs] v2 Make sync() satisfy many requests with one invocation

Dave Chinner <david@xxxxxxxxxxxxx> · Sat, 27 Jul 2013 12:57:03 +1000

On Fri, Jul 26, 2013 at 04:28:52PM -0700, Paul E. McKenney wrote:
> Dave Jones reported RCU stalls, overly long hrtimer interrupts, and
> amazingly long NMI handlers from a trinity-induced workload involving
> lots of concurrent sync() calls (https://lkml.org/lkml/2013/7/23/369).
> There are any number of things that one might do to make sync() behave
> better under high levels of contention, but it is also the case that
> multiple concurrent sync() system calls can be satisfied by a single
> sys_sync() invocation.
> 
> Given that this situation is reminiscent of rcu_barrier(), this commit
> applies the rcu_barrier() approach to sys_sync().  This approach uses
> a global mutex and a sequence counter.  The mutex is held across the
> sync() operation, which eliminates contention between concurrent sync()
> operations.
>
> The counter is incremented at the beginning and end of
> each sync() operation, so that it is odd while a sync() operation is in
> progress and even otherwise, just like sequence locks.
> 
> The code that used to be in sys_sync() is now in do_sync(), and sys_sync()
> now handles the concurrency.  The sys_sync() function first takes a
> snapshot of the counter, then acquires the mutex, and then takes another
> snapshot of the counter.  If the values of the two snapshots indicate that
> a full do_sync() executed during the mutex acquisition, the sys_sync()
> function releases the mutex and returns ("Our work is done!").  Otherwise,
> sys_sync() increments the counter, invokes do_sync(), and increments
> the counter again.
> 
> This approach allows a single call to do_sync() to satisfy an arbitrarily
> large number of sync() system calls, which should eliminate issues due
> to large numbers of concurrent invocations of the sync() system call.

This is not addressing the problem that is causing issues during
sync. Indeed, it only puts a bandaid over the currently observed
trigger.

Indeed, i suspect that this will significantly slow down concurrent
sync operations, as it serialised sync across all superblocks rather
than serialising per-superblock like is currently done. Indeed, that
per-superblock serialisation is where all the lock contention
problems are. And it's not sync alone that causes the contention
problems - it has to be combined with other concurrent workloads
that add or remove inodes from the inode cache at tha same time.

I have patches to address that by removing the source
of the lock contention completely, and not just for the sys_sync
trigger. Those patches make the problems with concurrent
sys_sync operation go away completely for me, not to mention improve
performance for 8+ thread metadata workloads on XFS significantly.

IOWs, I don't see that concurrent sys_sync operation is a problem at
all, and it is actively desirable for systems that have multiple
busy filesystems as it allows concurrent dispatch of IO across those
multiple filesystems. Serialising all sys_sync work might stop the
contention problems, but it will also slow down concurrent sync
operations on busy systems as it only allows one thread to dispatch
and wait for IO at a time.

So, let's not slap a bandaid over a symptom - let's address the
cause of the lock contention properly....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html