Currently, __sb_start_write() and freeze_super() can race with each other. __sb_start_write() uses a smp_mb() to ensure that freeze_super() can see its write to sb->s_writers.counter and that it can see freeze_super()'s update to sb->s_writers.frozen. This all seems to work fine. But, this smp_mb() makes __sb_start_write() the single hottest function in the kernel if I sit in a loop and do tiny write()s to tmpfs over and over. This is on a very small 2-core system, so it will only get worse on larger systems. This _seems_ like an ideal case for RCU. __sb_start_write() is the RCU read-side and is in a very fast, performance-sensitive path. freeze_super() is the RCU writer and is in an extremely rare non-performance-sensitive path. Instead of doing and smp_wmb() in __sb_start_write(), we do rcu_read_lock(). This ensures that a CPU doing freeze_super() can not proceed past its synchronize_rcu() until the grace period has ended and the 's_writers.frozen = SB_FREEZE_WRITE' is visible to __sb_start_write(). One question here: Does the work that __sb_start_write() does in a previous grace period becomes visible to freeze_super() after its call to synchronize_rcu()? It _seems_ like it should, but it seems backwards to me since __sb_start_write() is the "reader" in this case. This patch increases the number of writes/second that I can do by 10.4%. Does anybody see any holes with this? Cc: Jan Kara <jack@xxxxxxx> Cc: Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> Cc: linux-fsdevel@xxxxxxxxxxxxxxx Cc: linux-kernel@xxxxxxxxxxxxxxx Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> Cc: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx> --- b/fs/super.c | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff -puN fs/super.c~rcu-__sb_start_write fs/super.c --- a/fs/super.c~rcu-__sb_start_write 2015-06-19 14:50:53.081869092 -0700 +++ b/fs/super.c 2015-06-19 15:19:03.000473047 -0700 @@ -1190,27 +1190,25 @@ static void acquire_freeze_lock(struct s */ int __sb_start_write(struct super_block *sb, int level, bool wait) { -retry: - if (unlikely(sb->s_writers.frozen >= level)) { + /* + * RCU keeps freeze_super() from proceeding + * while we are in here. + */ + rcu_read_lock(); + while (unlikely(sb->s_writers.frozen >= level)) { + rcu_read_unlock(); if (!wait) - return 0; + return 0; wait_event(sb->s_writers.wait_unfrozen, sb->s_writers.frozen < level); + rcu_read_lock(); } #ifdef CONFIG_LOCKDEP acquire_freeze_lock(sb, level, !wait, _RET_IP_); #endif percpu_counter_inc(&sb->s_writers.counter[level-1]); - /* - * Make sure counter is updated before we check for frozen. - * freeze_super() first sets frozen and then checks the counter. - */ - smp_mb(); - if (unlikely(sb->s_writers.frozen >= level)) { - __sb_end_write(sb, level); - goto retry; - } + rcu_read_unlock(); return 1; } EXPORT_SYMBOL(__sb_start_write); @@ -1312,7 +1310,13 @@ int freeze_super(struct super_block *sb) /* From now on, no new normal writers can start */ sb->s_writers.frozen = SB_FREEZE_WRITE; - smp_wmb(); + /* + * After we synchronize_rcu(), we have ensured that everyone + * who reads sb->s_writers.frozen under rcu_read_lock() can + * now see our update. This pretty much means that + * __sb_start_write() will not allow any new writers. + */ + synchronize_rcu(); /* Release s_umount to preserve sb_start_write -> s_umount ordering */ up_write(&sb->s_umount); @@ -1322,7 +1326,7 @@ int freeze_super(struct super_block *sb) /* Now we go and block page faults... */ down_write(&sb->s_umount); sb->s_writers.frozen = SB_FREEZE_PAGEFAULT; - smp_wmb(); + synchronize_rcu(); sb_wait_write(sb, SB_FREEZE_PAGEFAULT); @@ -1331,7 +1335,7 @@ int freeze_super(struct super_block *sb) /* Now wait for internal filesystem counter */ sb->s_writers.frozen = SB_FREEZE_FS; - smp_wmb(); + synchronize_rcu(); sb_wait_write(sb, SB_FREEZE_FS); if (sb->s_op->freeze_fs) { @@ -1340,7 +1344,7 @@ int freeze_super(struct super_block *sb) printk(KERN_ERR "VFS:Filesystem freeze failed\n"); sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + synchronize_rcu(); wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb); return ret; @@ -1387,7 +1391,7 @@ int thaw_super(struct super_block *sb) out: sb->s_writers.frozen = SB_UNFROZEN; - smp_wmb(); + synchronize_rcu(); wake_up(&sb->s_writers.wait_unfrozen); deactivate_locked_super(sb); _ -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in