Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 30, 2022 at 11:45 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
[...]
> > > I have managed to grasp three two-socket machine, each has 256 cpus.
> > > The test has run about 7 hours till now without any problem by the following command:
> > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
> > > --duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"
> > >
> > > It seems promising.
> > >
> >
> > The test is against v6.0-rc7 kernel, and only with 96926686deab ("rcu:
> > Make CPU-hotplug removal operations enable tick") reverted. It is
> > close to the end, but unfortunately it fails.
> > Quote from remote-log
> > "
> > TREE04.57 ------- 4410955 GPs (27.2281/s) [rcu: g36045577 f0x0
> > total-gps=9011687] n_max_cbs: 4111392
> > TREE04.58 ------- 4368391 GPs (26.9654/s) [rcu: g35630093 f0x0
> > total-gps=8907816] n_max_cbs: 2411104
> > TREE04.59 ------- 800516 GPs (4.94146/s) n_max_cbs: 3634471
> > QEMU killed
> > TREE04.59 no success message, 10547 successful version messages
> > ^[[033mWARNING: ^[[mTREE04.59 GP HANG at 800516 torture stat 1925
> > ^[[033mWARNING: ^[[mAssertion failure in
> > /home/linux/tools/testing/selftests/rcutorture/res/2022.09.26-23.33.34-remote/TREE04.59/console.log
> > TREE04.59
> > ^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 8615
> > TREE04.6 ------- 4348443 GPs (26.8422/s) [rcu: g35341129 f0x0
> > total-gps=8835575] n_max_cbs: 2329432
>
> First, thank you for running this!
>
> This is not the typical failure that we were seeing, which would show
> up as a 2.199.0-second RCU CPU stall during which time there would be
> no console messages.
>
> But please do let me know how continuing tests go!
>

This time, the same test environment except against v6.0-rc7 mainline,
also encountered the not typical failure.

Quote from remote-log:
"
TREE04.11 ------- 555429 GPs (3.42857/s) n_max_cbs: 1907355
QEMU killed
TREE04.11 no success message, 10547 successful version messages
^[[033mWARNING: ^[[mTREE04.11 GP HANG at 555429 torture stat 1330
^[[033mWARNING: ^[[mAssertion failure in
/home/linux/tools/testing/selftests/rcutorture/res/2022.09.29-00.28.29-remote/TREE04.11/console.log
TREE04.11
^[[033mWARNING: ^[[mSummary: Call Traces: 1 Stalls: 9213
TREE04.12 ------- 4379523 GPs (27.0341/s) [rcu: g35803501 f0x0
total-gps=8951168] n_max_cbs: 2585955
...
TREE04.42 ------- 4383911 GPs (27.0612/s) [rcu: g35706861 f0x0
total-gps=8927008] n_max_cbs: 2544071
TREE04.43 ------- 4370806 GPs (26.9803/s) [rcu: g35793529 f0x0
total-gps=8948675] n_max_cbs: 2490251
^[[033mWARNING: ^[[mAssertion failure in
/home/linux/tools/testing/selftests/rcutorture/res/2022.09.29-00.28.29-remote/TREE04.43/console.log
TREE04.43
^[[033mWARNING: ^[[mSummary: Warnings: 1 Call Traces: 1
TREE04.44 ------- 4379942 GPs (27.0367/s) [rcu: g35806165 f0x0
total-gps=8951834] n_max_cbs: 1473528
...
TREE04.95 ------- 4850446 GPs (29.941/s) [rcu: g38867685 f0x0
total-gps=9717212] n_max_cbs: 2917447
TREE04.96 ------- 4880570 GPs (30.127/s) [rcu: g39023821 f0x0
total-gps=9756247] n_max_cbs: 1267969
2 runs with runtime errors.
 --- Done at Fri Sep 30 09:35:31 PM EDT 2022 (1d 21:07:02) exitcode 2
"

Quote from TREE04.11 console.log
"
...
[162001.301549] rcu_torture_fwd_prog n_max_cbs: 281053
[162001.301994] rcu_torture_fwd_prog: Starting forward-progress test 0
[162001.302531] rcu_torture_fwd_prog_cr: Starting forward-progress test 0
[162001.308449] rcu_torture_fwd_prog_cr: Waiting for CBs:
rcu_barrier+0x0/0x3b0() 0
[162002.352445] rcu_torture_fwd_prog_nr: Starting forward-progress test 0
[162002.360429] rcu_torture_fwd_prog_nr: Waiting for CBs:
rcu_barrier+0x0/0x3b0() 0
[162002.464428] rcu_torture_fwd_prog: tested 2104 tested_tries 2105
[162002.464989] rcu-torture: rcu_torture_fwd_prog is stopping
[162002.465518] rcu-torture: Stopping rcu_torture_writer task
[162005.040419] rcu-torture: rtc: 0000000089bf6f56 ver: 555429 tfle: 0
rta: 555430 rtaf: 0 rtf: 555420 rtmbe: 0 rtmbkf: 0/234319 rtbe: 0
rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206274892 onoff:
184783/184783:184787/184787 1,2001:2,3303 25759026:47497333 (HZ=1000)
barrier: 773866/773866:0 read-exits: 185056 nocb-toggles: 0:0
[162005.042858] rcu-torture: Reader Pipe:  344206401977 936546 0 0 0 0 0 0 0 0 0
[162005.043504] rcu-torture: Reader Batch:  344194661258 12677265 0 0
0 0 0 0 0 0 0
[162005.044047] rcu-torture: Free-Block Circulation:  555429 555428
555427 555426 555425 555424 555423 555422 555421 555420 0
[162005.044857] ??? Writer stall state RTWS_COND_SYNC_EXP(8) g30399832
f0x0 ->state 0x2 cpu 1
[162005.045474] rcu: rcu_sched: wait state: RCU_GP_WAIT_GPS(1)
->state: 0x402 ->rt_priority 0 delta ->gp_start 2575 ->gp_activity
2571 ->gp_req_activity 2575 ->gp_wake_time 2575 ->gp_wake_seq 30399828
->gp_seq 30399832 ->gp_seq_needed 30399832 ->gp_max 1118 ->gp_flags
0x0
[162005.047265] rcu:    CB 1^0->2 KbclSW F3852 L3852 C5 ..... q0 S CPU 0
[162005.047856] rcu:    CB 2^0->3 KbclSW F3747 L3747 C2 ..... q0 S CPU 0
[162005.048415] rcu:    CB 3^0->-1 KbclSW F2583 L2583 C2 ..... q0 S CPU 0
[162005.048992] rcu: nocb GP 4 KldtS W[..] ..:0 rnp 4:7 2042607 S CPU 0
[162005.049569] rcu:    CB 4^4->5 KbclSW F3839 L3839 C6 ..... q0 S CPU 0
[162005.050040] rcu:    CB 5^4->6 KbclSW F4449 L4449 C0 ..... q0 S CPU 0
[162005.050515] rcu:    CB 6^4->7 KbclSW F3784 L3785 C1 ..... q0 S CPU 0
[162005.050984] rcu:    CB 7^4->-1 KbclSW F3837 L3837 C7 ..... q0 S CPU 0
[162005.051478] rcu: RCU callbacks invoked since boot: 1422160737
[162005.051908] rcu-torture: rcu_torture_stats is stopping
"

Quote from TREE04.43 console.log
"
...
[162003.262292] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
[162003.264714] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
0 0 0 0 0 0
[162003.265336] rcu-torture: Reader Batch:  338929088717 14984045 0 0
0 0 0 0 0 0 0
[162003.265967] rcu-torture: Free-Block Circulation:  4371111 4371111
4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
[162003.266980] rcu-torture: rcu_torture_stats is stopping
[162003.267450] rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x3b0().
[162003.278480] Stack Depot allocating hash table of 32768 entries with kvcalloc
[162003.279947] mem_dump_obj() slab test: rcu_torture_stats =
0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e01a58000, &z
= ffffffffa676d9a0
[162003.281079] mem_dump_obj(ZERO_SIZE_PTR): zero-size pointer
[162003.281563] mem_dump_obj(NULL): NULL pointer
[162003.281935] mem_dump_obj(ffffa0aa0035feb8): 4-page vmalloc region
starting at 0xffffa0aa0035c000 allocated at kernel_clone+0x92/0x3b0
[162003.283044] mem_dump_obj(ffff965e01a58000): slab rcuscale start
ffff965e01a58000 pointer offset 0 allocated at
rcu_torture_cleanup+0x432/0x4e0
[162003.284138]     kmem_cache_alloc+0x19d/0x1b0
[162003.284522]     rcu_torture_cleanup+0x432/0x4e0
[162003.284918]     torture_shutdown+0xce/0x1c0
[162003.285285]     kthread+0xb7/0xe0
[162003.285587]     ret_from_fork+0x22/0x30
[162003.285926] mem_dump_obj(ffff965e01a58008): slab rcuscale start
ffff965e01a58000 pointer offset 8 allocated at
rcu_torture_cleanup+0x432/0x4e0
[162003.287045]     kmem_cache_alloc+0x19d/0x1b0
[162003.287436]     rcu_torture_cleanup+0x432/0x4e0
[162003.287835]     torture_shutdown+0xce/0x1c0
[162003.288202]     kthread+0xb7/0xe0
[162003.288510]     ret_from_fork+0x22/0x30
[162003.288848] mem_dump_obj(ffffffffa676d9a0): non-slab/vmalloc memory
[162003.289426] mem_dump_obj() kmalloc test: rcu_torture_stats =
0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e144447e0
[162003.290421] mem_dump_obj(kmalloc ffff965e144447e0): slab
kmalloc-16 start ffff965e144447e0 pointer offset 0 size 16
[162003.291316] mem_dump_obj(kmalloc ffff965e144447e8): slab
kmalloc-16 start ffff965e144447e0 pointer offset 8 size 16
[162003.292211] mem_dump_obj() vmalloc test: rcu_torture_stats =
0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffffa0aa0002d000
[162003.293186] mem_dump_obj(vmalloc ffffa0aa0002d000): 1-page vmalloc
region starting at 0xffffa0aa0002d000 allocated at
rcu_torture_cleanup.cold+0x287/0x3ef
[162003.294372] mem_dump_obj(vmalloc ffffa0aa0002d008): 1-page vmalloc
region starting at 0xffffa0aa0002d000 allocated at
rcu_torture_cleanup.cold+0x287/0x3ef
[162003.295548] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
[162003.297956] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
0 0 0 0 0 0
[162003.298578] rcu-torture: Reader Batch:  338929088717 14984045 0 0
0 0 0 0 0 0 0
[162003.299208] rcu-torture: Free-Block Circulation:  4371111 4371111
4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
[162003.300220] rcu-torture:--- End of test: SUCCESS: nreaders=7
nfakewriters=4 stat_interval=15 verbose=1 test_no_idle_hz=1
shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
fqs_stutter=3 test_boost=1/0 test_boost_interval=7
test_boost_duration=4 shutdown_secs=162000 stall_cpu=0
stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0
n_barrier_cbs=4 onoff_interval=200 onoff_holdoff=30 read_exit_delay=13
read_exit_burst=16 nocbs_nthreads=0 nocbs_toggle=1000
[162003.304162] i2c i2c-0: shutdown
[162003.304461] alarmtimer alarmtimer.0.auto: shutdown
[162003.304878] psmouse serio1: shutdown
...
"

I remember if reverting the commits suggested by Frederic, the failure
would occur within an hour.

But unfortunately, I did not keep the data. I will run it again and
submit the data.


Thanks,

Pingfan



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux