On Tue, Jan 17, 2023 at 11:43 AM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote: [...] > > >>>> > > >>>> How about something simple like the following? (untested) > > >>>> > > >>>> ---8<----------------------- > > >>>> > > >>>> diff --git a/kernel/torture.c b/kernel/torture.c > > >>>> index bc8fb361efc0..cd64110694c0 100644 > > >>>> --- a/kernel/torture.c > > >>>> +++ b/kernel/torture.c > > >>>> @@ -220,6 +220,9 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes, > > >>>> // PCI probe frequently disables hotplug during boot. > > >>>> (*n_offl_attempts)--; > > >>>> s = " (-EBUSY forgiven during boot)"; > > >>>> + } else if (tick_nohz_full_running && ret == -EBUSY) { > > >>>> + (*n_offl_attempts)--; > > >>>> + s = " (-EBUSY forgiven if nohz_full is running)"; > > >>> Fantastic fix!! thus we can fix the time keeper cpu torture problem > > >>> without touch the time keeper code. > > >> > > >> Thanks. Unfortunately this does not fix the issue for TRACE02 and the patch > > >> you shared does not fix it either -- because TRACE02 is not a no-hz-full > > >> test. :-( > > >> > > >> We will need to do a bit of tracing to figure out where the -EBUSY is coming > > >> from for TRACE02. > > > agree TRACE02 is another issue, unfortunately I can't reproduce the > > > bug neither with your original Image [1] > > > nor with my cross compiled kernel using [2]. > > > > > > I guess there may be two reasons: > > > 1) my testbed is X86_64 based. > > > 2) the command that I invoke qemu is not right: > > > 2-1) the newly compiled linux-5.15.89-rc1 > > > qemu-system-aarch64 -machine virt -cpu cortex-a57 -nographic -smp 4 > > > > Does 8 CPUs make any difference? That is my setup. > 8 CPUs make no difference ;-( Ah, it was worth a try! Hmm. > > Not sure what else is different. It could be a CPU model specific issue, or something. But why donot you just use the same setup you used in November and check TRACE02? That is actually what I was requesting you to rest, since you saw the same issue on that setup. > I guess it may be a CPU model specific issue, while I can't invoke > qemu-system-aarch64 with "-machine virt,gic-version=host -cpu host" > because I didn't have an aarch64 bare metal host. > > OK, I am doing the same setup on linux-5.15.y as I did last November > in the PPC VM of Open Source Lab of Oregon State University, this will > take about 20 hours, and report what I found after the test finishes. Sounds good, Thanks! Thanks, - Joel