On Thu, Jul 13, 2023 at 12:59 AM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote: > On 2023/7/13 12:52, Paul E. McKenney wrote: > > On Thu, Jul 13, 2023 at 12:41:09PM +0800, Gao Xiang wrote: > >> > >> > > ... > > >> > >> There are lots of performance issues here and even a plumber > >> topic last year to show that, see: > >> > >> [1] https://lore.kernel.org/r/20230519001709.2563-1-tj@xxxxxxxxxx > >> [2] https://lore.kernel.org/r/CAHk-=wgE9kORADrDJ4nEsHHLirqPCZ1tGaEPAZejHdZ03qCOGg@xxxxxxxxxxxxxx > >> [3] https://lore.kernel.org/r/CAB=BE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyUqFneQ@xxxxxxxxxxxxxx > >> [4] https://lpc.events/event/16/contributions/1338/ > >> and more. > >> > >> I'm not sure if it's necessary to look info all of that, > >> andSandeep knows more than I am (the scheduling issue > >> becomes vital on some aarch64 platform.) > > > > Hmmm... Please let me try again. > > > > Assuming that this approach turns out to make sense, the resulting > > patch will need to clearly state the performance benefits directly in > > the commit log. > > > > And of course, for the approach to make sense, it must avoid breaking > > the existing lockdep-RCU debugging code. > > > > Is that more clear? > > Personally I'm not working on Android platform any more so I don't > have a way to reproduce, hopefully Sandeep could give actually > number _again_ if dm-verity is enabled and trigger another > workqueue here and make a comparsion why the scheduling latency of > the extra work becomes unacceptable. > Question from my side, are we talking about only performance issues or also a crash? It appears z_erofs_decompress_pcluster() takes mutex_lock(&pcl->lock); So if it is either in an RCU read-side critical section or in an atomic section, like the softirq path, then it may schedule-while-atomic or trigger RCU warnings. z_erofs_decompressqueue_endio -> z_erofs_decompress_kickoff ->z_erofs_decompressqueue_work ->z_erofs_decompress_queue -> z_erofs_decompress_pcluster -> mutex_lock Per Sandeep in [1], this stack happens under RCU read-lock in: #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops) \ [...] rcu_read_lock(); (dispatch_ops); rcu_read_unlock(); [...] Coming from: blk_mq_flush_plug_list -> blk_mq_run_dispatch_ops(q, __blk_mq_flush_plug_list(q, plug)); and __blk_mq_flush_plug_list does this: q->mq_ops->queue_rqs(&plug->mq_list); This somehow ends up calling the bio_endio and the z_erofs_decompressqueue_endio which grabs the mutex. So... I have a question, it looks like one of the paths in __blk_mq_run_dispatch_ops() uses SRCU. Where are as the alternate path uses RCU. Why does this alternate want to block even if it is not supposed to? Is the real issue here that the BLK_MQ_F_BLOCKING should be set? It sounds like you want to block in the "else" path even though BLK_MQ_F_BLOCKING is not set: /* run the code block in @dispatch_ops with rcu/srcu read lock held */ #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops) \ do { \ if ((q)->tag_set->flags & BLK_MQ_F_BLOCKING) { \ struct blk_mq_tag_set *__tag_set = (q)->tag_set; \ int srcu_idx; \ \ might_sleep_if(check_sleep); \ srcu_idx = srcu_read_lock(__tag_set->srcu); \ (dispatch_ops); \ srcu_read_unlock(__tag_set->srcu, srcu_idx); \ } else { \ rcu_read_lock(); \ (dispatch_ops); \ rcu_read_unlock(); \ } \ } while (0); It seems quite incorrect to me that erofs wants to block even though clearly BLK_MQ_F_BLOCKING is not set. Did I miss something? I believe it may be more beneficial to correct this properly now, rather than maintaining the appearance of non-blocking operations and potentially having to manage atomicity detection with preempt counters at a later stage. thanks, - Joel [1] https://lore.kernel.org/all/CAB=BE-QV0PiKFpCOcdEUFxS4wJHsLCcsymAw+nP72Yr3b1WMng@xxxxxxxxxxxxxx/