On Thu, Jul 13, 2023 at 12:41:09PM +0800, Gao Xiang wrote: > > > On 2023/7/13 12:27, Paul E. McKenney wrote: > > On Thu, Jul 13, 2023 at 10:02:17AM +0800, Gao Xiang wrote: > > > > > > > > > On 2023/7/13 08:32, Joel Fernandes wrote: > > > > On Wed, Jul 12, 2023 at 02:20:56PM -0700, Sandeep Dhavale wrote: > > > > [..] > > > > > > As such this patch looks correct to me, one thing I noticed is that > > > > > > you can check rcu_is_watching() like the lockdep-enabled code does. > > > > > > That will tell you also if a reader-section is possible because in > > > > > > extended-quiescent-states, RCU readers should be non-existent or > > > > > > that's a bug. > > > > > > > > > > > Please correct me if I am wrong, reading from the comment in > > > > > kernel/rcu/update.c rcu_read_lock_held_common() > > > > > .. > > > > > * The reason for this is that RCU ignores CPUs that are > > > > > * in such a section, considering these as in extended quiescent state, > > > > > * so such a CPU is effectively never in an RCU read-side critical section > > > > > * regardless of what RCU primitives it invokes. > > > > > > > > > > It seems rcu will treat this as lock not held rather than a fact that > > > > > lock is not held. Is my understanding correct? > > > > > > > > If RCU treats it as a lock not held, that is a fact for RCU ;-). Maybe you > > > > mean it is not a fact for erofs? > > > > > > I'm not sure if I get what you mean, EROFS doesn't take any RCU read lock > > > here: > > > > The key point is that we need lockdep to report errors when > > rcu_read_lock(), rcu_dereference(), and friends are used when RCU is > > not watching. We also need lockdep to report an error when someone > > uses rcu_dereference() when RCU is not watching, but also forgets the > > rcu_read_lock(). > > > > And this is the job of rcu_read_lock_held(), which is one reason why > > that rcu_is_watching() is needed. > > > > > z_erofs_decompressqueue_endio() is actually a "bio->bi_end_io", previously > > > which can be called under two scenarios: > > > > > > 1) under softirq context, which is actually part of device I/O compleltion; > > > > > > 2) under threaded context, like what dm-verity or likewise calls. > > > > > > But EROFS needs to decompress in a threaded context anyway, so we trigger > > > a workqueue to resolve the case 1). > > > > > > Recently, someone reported there could be some case 3) [I think it was > > > introduced recently but I have no time to dig into it]: > > > > > > case 3: under RCU read lock context, which is shown by this: > > > https://lore.kernel.org/r/4a8254eb-ac39-1e19-3d82-417d3a7b9f94@xxxxxxxxxxxxxxxxx/T/#u > > > > > > and such RCU read lock is taken in __blk_mq_run_dispatch_ops(). > > > > > > But as the commit shown, we only need to trigger a workqueue for case 1) > > > and 3) due to performance reasons. > > > > Just out of curiosity, exactly how much is it costing to trigger the > > workqueue? > > There are lots of performance issues here and even a plumber > topic last year to show that, see: > > [1] https://lore.kernel.org/r/20230519001709.2563-1-tj@xxxxxxxxxx > [2] https://lore.kernel.org/r/CAHk-=wgE9kORADrDJ4nEsHHLirqPCZ1tGaEPAZejHdZ03qCOGg@xxxxxxxxxxxxxx > [3] https://lore.kernel.org/r/CAB=BE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyUqFneQ@xxxxxxxxxxxxxx > [4] https://lpc.events/event/16/contributions/1338/ > and more. > > I'm not sure if it's necessary to look info all of that, > andSandeep knows more than I am (the scheduling issue > becomes vital on some aarch64 platform.) Hmmm... Please let me try again. Assuming that this approach turns out to make sense, the resulting patch will need to clearly state the performance benefits directly in the commit log. And of course, for the approach to make sense, it must avoid breaking the existing lockdep-RCU debugging code. Is that more clear? Thanx, Paul