Re: TREE04 and TREE07 failures on 5.15 stable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Aug 7, 2023, at 10:06 PM, Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote:
> 
> On Tue, Aug 8, 2023 at 5:08 AM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote:
>> 
>>> On Tue, Aug 8, 2023 at 12:56 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Mon, Aug 7, 2023 at 11:47 AM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote:
>>>> 
>>>> On Mon, Aug 7, 2023 at 11:38 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>>>>> 
>>>>> Btw full logs are here:
>>>>> http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-5.15.y/80/artifact/tools/testing/selftests/rcutorture/res/2023.08.07-04.01.19/
>>>>> 
>>>>> And some more comments:
>>>>> 
>>>>> On Mon, Aug 7, 2023 at 10:22 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>>>>>> 
>>>>>> Hello, Frederick, all,
>>>>>> TREE04 and TREE07 still fail on the 5.15 kernel.
>>>>>> 
>>>>>> The stutter patches helped 5.10, 6.1 and 6.4 but 5.15 still is
>>>>>> troubled. Any thoughts?
>>>>>> 
>>>>>> OTOH, I am not sure how many NO_HZ_FULL users are on the 5.15 kernel.
>>>>>> Do you have any idea?
>>>>>> 
>>>>>> Meanwhile I'll continue to debug it. Thank you!
>>>>>> 
>>>>>> --- Mon Aug  7 07:37:10 AM UTC 2023 Test summary:
>>>>>> Results directory:
>>>>>> /usr/local/google/home/joelaf/.jenkins/workspace/rcutorture_stable_linux-5.15.y/tools/testing/selftests/rcutorture/res/2023.08.07-04.01.19
>>>>>> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 48 --duration 60
>>>>>> RUDE01 ------- 22984 GPs (6.38444/s) [tasks-rude: g0 f0x0 total-gps=0]
>>>>>> SRCU-N ------- 318544 GPs (88.4844/s) [srcu: g4127136 f0x0 total-gps=4127136]
>>>>>> SRCU-P ------- 268977 GPs (74.7158/s) [srcud: g2180244 f0x0 total-gps=2180244]
>>>>>> SRCU-T ------- 497422 GPs (138.173/s) [srcu: g57760 f0x0 total-gps=57760]
>>>>>> SRCU-U ------- 1152598 GPs (320.166/s) [srcud: g33534 f0x0 total-gps=33534]
>>>>>> TASKS01 ------- 11516 GPs (3.19889/s) [tasks: g0 f0x0 total-gps=0]
>>>>>> TASKS02 ------- 13407 GPs (3.72417/s) [tasks: g0 f0x0 total-gps=0]
>>>>>> TASKS03 ------- 4598 GPs (1.27722/s) [tasks: g0 f0x0 total-gps=0]
>>>>>> TINY01 ------- 422041 GPs (117.234/s) [rcu: g0 f0x0 total-gps=0]
>>>>>> n_max_cbs: 77229
>>>>>> TINY02 ------- 441676 GPs (122.688/s) [rcu: g0 f0x0 total-gps=0]
>>>>>> n_max_cbs: 10691
>>>>>> TRACE01 ------- 12593 GPs (3.49806/s) [tasks-tracing: g0 f0x0 total-gps=0]
>>>>>> TRACE02 ------- 106606 GPs (29.6128/s) [tasks-tracing: g0 f0x0 total-gps=0]
>>>>>> TREE01 ------- 42614 GPs (11.8372/s) [rcu: g346169 f0x0 total-gps=86809]
>>>>>> TREE02 ------- 161421 GPs (44.8392/s) [rcu: g1356445 f0x0
>>>>>> total-gps=339371] n_max_cbs: 177936
>>>>>> TREE03 ------- 105074 GPs (29.1872/s) [rcu: g1699381 f0x2
>>>>>> total-gps=425089] n_max_cbs: 631295
>>>>>> TREE04 ------- 106036 GPs (29.4544/s) n_max_cbs: 550875
>>>>>> QEMU killed
>>>>> 
>>>>> This seems hotplug related. Not sure.
>>>>> 
>>>>>> TREE04 no success message, 222 successful version messages
>>>>>> !!! PID 2616159 hung at 3780 vs. 3600 seconds Mon Aug  7 06:27:38 AM UTC 2023
>>>>>> TREE05 ------- 132006 GPs (36.6683/s) [rcu: g1043673 f0x0
>>>>>> total-gps=261183] n_max_cbs: 160463
>>>>>> TREE07 ------- 52858 GPs (14.6828/s) n_max_cbs: 429828
>>>>>> QEMU killed
>>>>>> TREE07 no success message, 234 successful version messages
>>>>>> [033mWARNING:  [mAssertion failure in
>>>>>> /usr/local/google/home/joelaf/.jenkins/workspace/rcutorture_stable_linux-5.15.y/tools/testing/selftests/rcutorture/res/2023.08.07-04.01.19/TREE07/console.log
>>>>>> TREE07
>>>>>> [033mWARNING:  [mSummary: Call Traces: 6 Stalls: 5 Starves: 1
>>>>> 
>>>>> Looks like this was because it is missing:
>>>>> lore.kernel.org/r/20230101061555.278129-1-joel@xxxxxxxxxxxxxxxxx
>>>> 5.15.y (38d4ca22a528) is missing above patch. I started
>>>> tools/testing/selftests/rcutorture/bin/torture.sh on my  i7-12800H
>>>> nested kvm ubuntu environment just now, the test will last for about 7
>>>> hours.
>>>> 
>>>> Hope I can be of some beneficial ;-)
>>> 
>>> Thanks Zhouyi ;-)
>>> 
>>> I am also doing a debug run to trackdown what's up with TREE04,  you
>>> can see the test running live here ;-)
>>> http://box.joelfernandes.org:9080/job/rcutorture_stable/job/linux-5.15.y/83/console
>>> 
>>> (in case you want to see the kvm.sh options I passed for tracing, take
>>> a look at the link).
>> Thank you Joel for your instruction.
>> I stopped torture.sh, and run the kvm.sh with the options above both
>> on my  i7-12800H and Dell PowerEdge R720 with two Intel(R) Xeon(R) CPU
>> E5-2660 (which I use to test [1]).
> I can't reproduce the phenomenon in above environments, maybe the
> hardware plays in important role in our scenery.
> 
> [ 7204.697370] mem_dump_obj() vmalloc test: rcu_torture_stats =
> 0000000000000000, &rhp = ffffc9000036feb0, rhp = ffffc9000000c000
> [ 7204.701895] mem_dump_obj(vmalloc ffffc9000000c000): 1-page vmalloc
> region starting at 0xffffc9000000c000 allocated at
> rcu_torture_cleanup.cold+0x218/c
> [ 7204.707293] mem_dump_obj(vmalloc ffffc9000000c008): 1-page vmalloc
> region starting at 0xffffc9000000c000 allocated at
> rcu_torture_cleanup.cold+0x218/c
> [ 7204.712755] rcu-torture: rtc: 0000000000000000 VER: 343876 tfle: 0
> rta: 343876 rtaf: 0 rtf: 343865 rtmbe: 0 rtmbkf: 0/138644 rtbe: 0
> rtbke: 0 rtbre: 0
> [ 7204.723319] rcu-torture: Reader Pipe:  12958247071 489089 0 0 0 0 0 0 0 0 0
> [ 7204.726240] rcu-torture: Reader Batch:  12957121006 1615154 0 0 0 0 0 0 0 0 0
> [ 7204.729146] rcu-torture: Free-Block Circulation:  343875 343875
> 343872 343871 343870 343869 343868 343867 343866 343865 0
> [ 7204.733441] rcu-torture:--- End of test: SUCCESS: nreaders=7
> nfakewriters=4 stat_interval=15 verbose=1 test_no_idle_hz=1
> shuffle_interval=3 stutter=50
> [ 7204.750432] platform platform-framebuffer.0: shutdown
> 
> I am very regret not able to help

No regrets and thanks for checking. It could also be a host side issue in my setup.

I will keep digging, thanks.

- Joel


> 
> Thanks
> Zhouyi
>> 
>> The tests seem to end after 7200 seconds, I will report to you after that.
>> 
>> Thanks
>> Zhouyi
>> [1] https://lore.kernel.org/all/CAABZP2wSoEzfMWRdxGb6TmWVeN4xDUu5qjnG0d8RfaO7AovGZQ@xxxxxxxxxxxxxx/
>>> 
>>> - Joel




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux