On Wed, Feb 01, 2023 at 04:37:00PM +0800, Yujie Liu wrote: > Hi Paul, Hi Liam, > > On Tue, Jan 31, 2023 at 12:52:21PM -0800, Paul E. McKenney wrote: > > On Tue, Jan 31, 2023 at 03:45:20PM -0500, Liam R. Howlett wrote: > > > * Paul E. McKenney <paulmck@xxxxxxxxxx> [230131 15:26]: > > > > On Tue, Jan 31, 2023 at 03:18:22PM +0800, kernel test robot wrote: > > > > > Hi Liam, > > > > > > > > > > We caught a "task blocked" dmesg in maple tree test. Not sure if this > > > > > is expected for maple tree test, so we are sending this report for > > > > > your information. Thanks. > > > > > > > > > > Greeting, > > > > > > > > > > FYI, we noticed INFO:task_blocked_for_more_than#seconds due to commit (built with clang-14): > > > > > > > > > > commit: 120b116208a0877227fc82e3f0df81e7a3ed4ab1 ("maple_tree: reorganize testing to restore module testing") > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > > > in testcase: boot > > > > > > > > > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > > > > > > > > > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > > > > > > > > > > > > > > [ 17.318428][ T1] calling maple_tree_seed+0x0/0x15d0 @ 1 > > > > > [ 17.319219][ T1] > > > > > [ 17.319219][ T1] TEST STARTING > > > > > [ 17.319219][ T1] > > > > > [ 999.249871][ T23] INFO: task rcu_scale_shutd:59 blocked for more than 491 seconds. > > > > > [ 999.253363][ T23] Not tainted 6.1.0-rc4-00003-g120b116208a0 #1 > > > > > [ 999.254249][ T23] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > > [ 999.255390][ T23] task:rcu_scale_shutd state:D stack:30968 pid:59 ppid:2 flags:0x00004000 > > > > > [ 999.256934][ T23] Call Trace: > > > > > [ 999.257418][ T23] <TASK> > > > > > [ 999.257900][ T23] __schedule+0x169b/0x1f90 > > > > > [ 999.261677][ T23] schedule+0x151/0x300 > > > > > [ 999.262281][ T23] ? compute_real+0xe0/0xe0 > > > > > [ 999.263364][ T23] rcu_scale_shutdown+0xdd/0x130 > > > > > [ 999.264093][ T23] ? wake_bit_function+0x2c0/0x2c0 > > > > > [ 999.268985][ T23] kthread+0x309/0x3a0 > > > > > [ 999.269958][ T23] ? compute_real+0xe0/0xe0 > > > > > [ 999.270552][ T23] ? kthread_unuse_mm+0x200/0x200 > > > > > [ 999.271281][ T23] ret_from_fork+0x1f/0x30 > > > > > [ 999.272385][ T23] </TASK> > > > > > [ 999.272865][ T23] > > > > > [ 999.272865][ T23] Showing all locks held in the system: > > > > > [ 999.273988][ T23] 2 locks held by swapper/0/1: > > > > > [ 999.274684][ T23] 1 lock held by khungtaskd/23: > > > > > [ 999.275400][ T23] #0: ffffffff88346e00 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x8/0x30 > > > > > [ 999.277171][ T23] > > > > > [ 999.277525][ T23] ============================================= > > > > > [ 999.277525][ T23] > > > > > [ 1049.050884][ T1] maple_tree: 12610686 of 12610686 tests passed > > > > > > > > > > > > > > > If you fix the issue, kindly add following tag > > > > > | Reported-by: kernel test robot <yujie.liu@xxxxxxxxx> > > > > > | Link: https://lore.kernel.org/oe-lkp/202301310940.4a37c7af-yujie.liu@xxxxxxxxx > > > > > > > > Liam brought this to my attention on IRC, and it looks like the root > > > > cause is that the rcuscale code does not deal gracefully with grace > > > > periods that are in much excess of a second in duration. > > > > > > > > Now, it might well be worth looking into why the grace periods were taking > > > > that long, but if you were running Maple Tree stress tests concurrently > > > > with rcuscale, this might well be expected behavior. > > > > > > > > > > This could be simply cpu starvation causing no foward progress in your > > > tests with the number of concurrent running tests and "-smp 2". > > > > > > It's also worth noting that building in the rcu test module makes the > > > machine turn off once the test is complete. This can be seen in your > > > console message: > > > [ 13.254240][ T1] rcu-scale:--- Start of test: nreaders=2 nwriters=2 verbose=1 shutdown=1 > > > > > > so your machine may not have finished running through the array of tests > > > you have specified to build in - which is a lot. I'm not sure if this > > > is the best approach considering the load that produces on the system > > > and how difficult it is (was) to figure out which test is causing a > > > stall, or other issue. > > > > Agreed, both rcuscale and refscale when built in turn the machine off at > > the end of the test. For providing background stress for some other test > > (in this case Maple Tree tests), rcutorture, locktorture, or scftorture > > might be better choices. > > Thanks for looking into this. This is a boot test on a randconfig > kernel, and yes, it happend to select CONFIG_RCU_SCALE_TEST=y together > with CONFIG_TEST_MAPLE_TREE=y, leading to the situation in this case. > > I've tested the patch on same config, it does clear up the "task > blocked" log, though it still waits a long time at this step. The test > result is as follows: > > [ 18.397784][ T1] calling maple_tree_seed+0x0/0x15d0 @ 1 > [ 18.398646][ T1] > [ 18.398646][ T1] TEST STARTING > [ 18.398646][ T1] > [ 1266.450656][ T1] maple_tree: 12610686 of 12610686 tests passed > [ 1266.451749][ T1] initcall maple_tree_seed+0x0/0x15d0 returned 0 after 1248053116 usecs > ... > > ========================================================================================= > compiler/kconfig/rootfs/sleep/tbox_group/testcase: > clang-14/x86_64-randconfig-a006-20230116/yocto-x86_64-minimal-20190520.cgz/300/vm-snb/boot > > commit: > 120b116208a08 ("maple_tree: reorganize testing to restore module testing") > 4b1aafbdb208f ("rcuscale: Move shutdown from wait_event() to wait_event_idle()") > > 120b116208a08 4b1aafbdb208fb4e10bf7abff1a > ---------------- --------------------------- > fail:runs %reproduction fail:runs > | | | > 5:6 -83% :6 dmesg.INFO:task_blocked_for_more_than#seconds > > Tested-by: Yujie Liu <yujie.liu@xxxxxxxxx> Thank you very much! I will apply this on my next rebase. Thanx, Paul