Sorry, I seem to have missed this email. On Mon, May 06, 2019 at 06:50:09PM +0200, Oleg Nesterov wrote: > On 05/03, Peter Zijlstra wrote: > > > > -static void lockdep_sb_freeze_release(struct super_block *sb) > > -{ > > - int level; > > - > > - for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--) > > - percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, _THIS_IP_); > > -} > > - > > -/* > > - * Tell lockdep we are holding these locks before we call ->unfreeze_fs(sb). > > - */ > > -static void lockdep_sb_freeze_acquire(struct super_block *sb) > > -{ > > - int level; > > - > > - for (level = 0; level < SB_FREEZE_LEVELS; ++level) > > - percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, _THIS_IP_); > > + percpu_down_write_non_owner(sb->s_writers.rw_sem + level-1); > > } > > I'd suggest to not change fs/super.c, keep these helpers, and even not introduce > xxx_write_non_owner(). > > freeze_super() takes other locks, it calls sync_filesystem(), freeze_fs(), lockdep > should know that this task holds SB_FREEZE_XXX locks for writing. Bah, I so hate these games. But OK, I suppose. > > @@ -80,14 +83,8 @@ int __percpu_down_read(struct percpu_rw_ > > * and reschedule on the preempt_enable() in percpu_down_read(). > > */ > > preempt_enable_no_resched(); > > - > > - /* > > - * Avoid lockdep for the down/up_read() we already have them. > > - */ > > - __down_read(&sem->rw_sem); > > + wait_event(sem->waiters, !atomic_read(&sem->block)); > > this_cpu_inc(*sem->read_count); > > Argh, this looks racy :/ > > Suppose that sem->block == 0 when wait_event() is called, iow the writer released > the lock. > > Now suppose that this __percpu_down_read() races with another percpu_down_write(). > The new writer can set sem->block == 1 and call readers_active_check() in between, > after wait_event() and before this_cpu_inc(*sem->read_count). CPU0 CPU1 CPU2 percpu_up_write() sem->block = 0; __percpu_down_read() wait_event(, !sem->block); percpu_down_write() wait_event_exclusive(, xchg(sem->block,1)==0); readers_active_check() this_cpu_inc(); *whoopsy* reader while write owned. I suppose we can 'patch' that by checking blocking again after we've incremented, something like the below. But looking at percpu_down_write() we have two wait_event*() on the same queue back to back, which is 'odd' at best. Let me ponder that a little more. --- --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -61,6 +61,7 @@ int __percpu_down_read(struct percpu_rw_ * writer missed them. */ +again: smp_mb(); /* A matches D */ /* @@ -87,7 +88,13 @@ int __percpu_down_read(struct percpu_rw_ wait_event(sem->waiters, !atomic_read_acquire(&sem->block)); this_cpu_inc(*sem->read_count); preempt_disable(); - return 1; + + /* + * percpu_down_write() could've set ->blocked right after we've seen it + * 0 but missed our this_cpu_inc(), which is exactly the condition we + * get called for from percpu_down_read(). + */ + goto again; } EXPORT_SYMBOL_GPL(__percpu_down_read);