On Thu, Jul 13, 2023 at 09:33:35AM -0700, Paul E. McKenney wrote: > On Thu, Jul 13, 2023 at 11:33:24AM -0400, Joel Fernandes wrote: > > On Thu, Jul 13, 2023 at 10:34 AM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote: > > > On 2023/7/13 22:07, Joel Fernandes wrote: > > > > On Thu, Jul 13, 2023 at 12:59 AM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote: > > > >> On 2023/7/13 12:52, Paul E. McKenney wrote: > > > >>> On Thu, Jul 13, 2023 at 12:41:09PM +0800, Gao Xiang wrote: > > > >> > > > >> ... > > > >> > > > >>>> > > > >>>> There are lots of performance issues here and even a plumber > > > >>>> topic last year to show that, see: > > > >>>> > > > >>>> [1] https://lore.kernel.org/r/20230519001709.2563-1-tj@xxxxxxxxxx > > > >>>> [2] https://lore.kernel.org/r/CAHk-=wgE9kORADrDJ4nEsHHLirqPCZ1tGaEPAZejHdZ03qCOGg@xxxxxxxxxxxxxx > > > >>>> [3] https://lore.kernel.org/r/CAB=BE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyUqFneQ@xxxxxxxxxxxxxx > > > >>>> [4] https://lpc.events/event/16/contributions/1338/ > > > >>>> and more. > > > >>>> > > > >>>> I'm not sure if it's necessary to look info all of that, > > > >>>> andSandeep knows more than I am (the scheduling issue > > > >>>> becomes vital on some aarch64 platform.) > > > >>> > > > >>> Hmmm... Please let me try again. > > > >>> > > > >>> Assuming that this approach turns out to make sense, the resulting > > > >>> patch will need to clearly state the performance benefits directly in > > > >>> the commit log. > > > >>> > > > >>> And of course, for the approach to make sense, it must avoid breaking > > > >>> the existing lockdep-RCU debugging code. > > > >>> > > > >>> Is that more clear? > > > >> > > > >> Personally I'm not working on Android platform any more so I don't > > > >> have a way to reproduce, hopefully Sandeep could give actually > > > >> number _again_ if dm-verity is enabled and trigger another > > > >> workqueue here and make a comparsion why the scheduling latency of > > > >> the extra work becomes unacceptable. > > > >> > > > > > > > > Question from my side, are we talking about only performance issues or > > > > also a crash? It appears z_erofs_decompress_pcluster() takes > > > > mutex_lock(&pcl->lock); > > > > > > > > So if it is either in an RCU read-side critical section or in an > > > > atomic section, like the softirq path, then it may > > > > schedule-while-atomic or trigger RCU warnings. > > > > > > > > z_erofs_decompressqueue_endio > > > > -> z_erofs_decompress_kickoff > > > > ->z_erofs_decompressqueue_work > > > > ->z_erofs_decompress_queue > > > > -> z_erofs_decompress_pcluster > > > > -> mutex_lock > > > > > > > > > > Why does the softirq path not trigger a workqueue instead? > > > > I said "if it is". I was giving a scenario. mutex_lock() is not > > allowed in softirq context or in an RCU-reader. > > > > > > Per Sandeep in [1], this stack happens under RCU read-lock in: > > > > > > > > #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops) \ > > > > [...] > > > > rcu_read_lock(); > > > > (dispatch_ops); > > > > rcu_read_unlock(); > > > > [...] > > > > > > > > Coming from: > > > > blk_mq_flush_plug_list -> > > > > blk_mq_run_dispatch_ops(q, > > > > __blk_mq_flush_plug_list(q, plug)); > > > > > > > > and __blk_mq_flush_plug_list does this: > > > > q->mq_ops->queue_rqs(&plug->mq_list); > > > > > > > > This somehow ends up calling the bio_endio and the > > > > z_erofs_decompressqueue_endio which grabs the mutex. > > > > > > > > So... I have a question, it looks like one of the paths in > > > > __blk_mq_run_dispatch_ops() uses SRCU. Where are as the alternate > > > > path uses RCU. Why does this alternate want to block even if it is not > > > > supposed to? Is the real issue here that the BLK_MQ_F_BLOCKING should > > > > be set? It sounds like you want to block in the "else" path even > > > > though BLK_MQ_F_BLOCKING is not set: > > > > > > BLK_MQ_F_BLOCKING is not a flag that a filesystem can do anything with. > > > That is block layer and mq device driver stuffs. filesystems cannot set > > > this value. > > > > > > As I said, as far as I understand, previously, > > > .end_io() can only be called without RCU context, so it will be fine, > > > but I don't know when .end_io() can be called under some RCU context > > > now. > > > > >From what Sandeep described, the code path is in an RCU reader. My > > question is more, why doesn't it use SRCU instead since it clearly > > does so if BLK_MQ_F_BLOCKING. What are the tradeoffs? IMHO, a deeper > > dive needs to be made into that before concluding that the fix is to > > use rcu_read_lock_any_held(). > > How can this be solved? > > 1. Always use a workqueue. Simple, but is said to have performance > issues. > > 2. Pass a flag in that indicates whether or not the caller is in an > RCU read-side critical section. Conceptually simple, but might > or might not be reasonable to actually implement in the code as > it exists now. (You tell me!) > > 3. Create a function in z_erofs that gives you a decent > approximation, maybe something like the following. > > 4. Other ideas here. 5. #3 plus make the corresponding Kconfig option select PREEMPT_COUNT, assuming that any users needing compression in non-preemptible kernels are OK with PREEMPT_COUNT being set. (Some users of non-preemptible kernels object strenuously to the added overhead from CONFIG_PREEMPT_COUNT=y.) Thanx, Paul > The following is untested, and is probably quite buggy, but it should > provide you with a starting point. > > static bool z_erofs_wq_needed(void) > { > if (IS_ENABLED(CONFIG_PREEMPTION) && rcu_preempt_depth()) > return true; // RCU reader > if (IS_ENABLED(CONFIG_PREEMPT_COUNT) && !preemptible()) > return true; // non-preemptible > if (!IS_ENABLED(CONFIG_PREEMPT_COUNT)) > return true; // non-preeemptible kernel, so play it safe > return false; > } > > You break it, you buy it! ;-) > > Thanx, Paul