On Sat, Jul 27, 2013 at 12:57:03PM +1000, Dave Chinner wrote: > On Fri, Jul 26, 2013 at 04:28:52PM -0700, Paul E. McKenney wrote: > > Dave Jones reported RCU stalls, overly long hrtimer interrupts, and > > amazingly long NMI handlers from a trinity-induced workload involving > > lots of concurrent sync() calls (https://lkml.org/lkml/2013/7/23/369). > > There are any number of things that one might do to make sync() behave > > better under high levels of contention, but it is also the case that > > multiple concurrent sync() system calls can be satisfied by a single > > sys_sync() invocation. > > > > Given that this situation is reminiscent of rcu_barrier(), this commit > > applies the rcu_barrier() approach to sys_sync(). This approach uses > > a global mutex and a sequence counter. The mutex is held across the > > sync() operation, which eliminates contention between concurrent sync() > > operations. > > > > The counter is incremented at the beginning and end of > > each sync() operation, so that it is odd while a sync() operation is in > > progress and even otherwise, just like sequence locks. > > > > The code that used to be in sys_sync() is now in do_sync(), and sys_sync() > > now handles the concurrency. The sys_sync() function first takes a > > snapshot of the counter, then acquires the mutex, and then takes another > > snapshot of the counter. If the values of the two snapshots indicate that > > a full do_sync() executed during the mutex acquisition, the sys_sync() > > function releases the mutex and returns ("Our work is done!"). Otherwise, > > sys_sync() increments the counter, invokes do_sync(), and increments > > the counter again. > > > > This approach allows a single call to do_sync() to satisfy an arbitrarily > > large number of sync() system calls, which should eliminate issues due > > to large numbers of concurrent invocations of the sync() system call. > > This is not addressing the problem that is causing issues during > sync. Indeed, it only puts a bandaid over the currently observed > trigger. > > Indeed, i suspect that this will significantly slow down concurrent > sync operations, as it serialised sync across all superblocks rather > than serialising per-superblock like is currently done. Indeed, that > per-superblock serialisation is where all the lock contention > problems are. And it's not sync alone that causes the contention > problems - it has to be combined with other concurrent workloads > that add or remove inodes from the inode cache at tha same time. Seems like something along the lines of wakeup_flusher_threads() currently at the start of sys_sync() would address this. > I have patches to address that by removing the source > of the lock contention completely, and not just for the sys_sync > trigger. Those patches make the problems with concurrent > sys_sync operation go away completely for me, not to mention improve > performance for 8+ thread metadata workloads on XFS significantly. > > IOWs, I don't see that concurrent sys_sync operation is a problem at > all, and it is actively desirable for systems that have multiple > busy filesystems as it allows concurrent dispatch of IO across those > multiple filesystems. Serialising all sys_sync work might stop the > contention problems, but it will also slow down concurrent sync > operations on busy systems as it only allows one thread to dispatch > and wait for IO at a time. > > So, let's not slap a bandaid over a symptom - let's address the > cause of the lock contention properly.... Hmmm... Could you please send your patches over to Dave Jones right now? I am getting quite tired of getting RCU CPU stall warning complaints from him that turn out to be due to highly contended sync() system calls. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html