Hi Marc, Thanks a lot for taking a look. On Mon, Jan 16, 2023 at 1:03 PM Marc Zyngier <maz@xxxxxxxxxx> wrote: > > Hi Joel, > > On Mon, 16 Jan 2023 17:03:31 +0000, > Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > Hello, > > I am seeing -EBUSY returned a lot during torture_onoff() when running > > rcutorture on arm64. This causes hotplug failure 30% of the time. I am > > also seeing this in 6.1-rc kernels. I believe see this only for CPU0. > > > > This causes warnings in torture tests: > > [ 217.582290] rcu-torture:torture_onoff task: offline 0 failed: errno -16 > > [ 221.866362] rcu-torture:torture_onoff task: offline 0 failed: errno -16 > > > > Full kernel log here: > > http://box.joelfernandes.org:9080/job/rcutorture_stable_arm/job/linux-5.15.y/7/artifact/tools/testing/selftests/rcutorture/res/2023.01.15-14.51.11/TREE04/console.log > > > > Any ideas on why this is happening and only for CPU 0 (presumably the > > boot CPU)? I'd personally need these warnings to go away for my tests > > as this causes rcutorture's tests to not cleanly pass for me. It > > appears remove_cpu() -> device_offline() is what returns the error. > > I've taken your kernel for a ride as a KVM guest (probably similar to > what you are doing), and saw the same thing (CPU0 not offlining): > > [ 64.555845] Detected VIPT I-cache on CPU4 > [ 64.556146] GICv3: CPU4: found redistributor 4 region 0:0x000000003ff70000 > [ 64.556689] CPU4: Booted secondary processor 0x0000000004 [0x612f0290] > [ 69.823670] rcu-torture:torture_onoff task: offline 0 failed: errno -16 > [ 73.991960] psci: CPU7 killed (polled 0 ms) > [ 74.239626] rcu-torture: rcu_torture_read_exit: Start of episode > [ 74.243863] rcu-torture: rcu_torture_read_exit: End of episode > > I then tried v6.2-rc4 with defconfig + RCU_TORTURE and your command > line, and CPU0 does seem to hotplug off correctly: Interesting, can you try the Config fragment of the failing config on the 6.2-rc4 [1] ? [1] http://box.joelfernandes.org:9080/job/rcutorture_stable_arm/job/linux-5.15.y/7/artifact/tools/testing/selftests/rcutorture/res/2023.01.15-14.51.11/TREE04/ConfigFragment Notable, it has the following which Zhoui said he was able to repro with on another arch as well: CONFIG_NO_HZ_IDLE=n CONFIG_NO_HZ_FULL=y > [ 47.217109] psci: CPU3 killed (polled 0 ms) > [ 52.241009] Detected VIPT I-cache on CPU3 > [ 52.241227] cacheinfo: Unable to detect cache hierarchy for CPU 3 > [ 52.241481] GICv3: CPU3: found redistributor 3 region 0:0x000000003ff50000 > [ 52.241849] CPU3: Booted secondary processor 0x0000000003 [0x612f0290] > [ 56.337011] psci: CPU0 killed (polled 0 ms) > [...] > [ 121.090339] rcu-torture: Free-Block Circulation: 922 920 919 918 917 916 914 913 912 911 0 > [ 125.574311] Detected VIPT I-cache on CPU0 > [ 125.574557] cacheinfo: Unable to detect cache hierarchy for CPU 0 > [ 125.574901] GICv3: CPU0: found redistributor 0 region 0:0x000000003fef0000 > [ 125.575322] CPU0: Booted secondary processor 0x0000000000 [0x612f0290] > [ 130.176893] rcu-torture: rcu_torture_read_exit: Start of episode > [ 130.317001] psci: CPU0 killed (polled 0 ms) > [...] > [ 225.588999] Detected VIPT I-cache on CPU0 > [ 225.589224] cacheinfo: Unable to detect cache hierarchy for CPU 0 > [ 225.589535] GICv3: CPU0: found redistributor 0 region 0:0x000000003fef0000 > [ 225.589946] CPU0: Booted secondary processor 0x0000000000 [0x612f0290] > > No such error is being reported. > > Is there anything special in your config that would help triggering > this with the current tip of tree? Perhaps, your config needs the options in the config fragment I mentioned above. Thanks! - Joel