Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 27, 2022 at 11:13:46AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 10, 2022 at 09:42:07AM +0800, Pingfan Liu wrote:
> > On Mon, Oct 3, 2022 at 12:20 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > >
> > > On Sun, Oct 02, 2022 at 09:29:59PM +0800, Pingfan Liu wrote:
> > [...]
> > 
> > > >
> > > > Quote from TREE04.43 console.log
> > > > "
> > > > ...
> > > > [162003.262292] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> > > > 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> > > > rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> > > > 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> > > > barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> > > > [162003.264714] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> > > > 0 0 0 0 0 0
> > > > [162003.265336] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> > > > 0 0 0 0 0 0 0
> > > > [162003.265967] rcu-torture: Free-Block Circulation:  4371111 4371111
> > > > 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> > > > [162003.266980] rcu-torture: rcu_torture_stats is stopping
> > > > [162003.267450] rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x3b0().
> > > > [162003.278480] Stack Depot allocating hash table of 32768 entries with kvcalloc
> > > > [162003.279947] mem_dump_obj() slab test: rcu_torture_stats =
> > > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e01a58000, &z
> > > > = ffffffffa676d9a0
> > > > [162003.281079] mem_dump_obj(ZERO_SIZE_PTR): zero-size pointer
> > > > [162003.281563] mem_dump_obj(NULL): NULL pointer
> > > > [162003.281935] mem_dump_obj(ffffa0aa0035feb8): 4-page vmalloc region
> > > > starting at 0xffffa0aa0035c000 allocated at kernel_clone+0x92/0x3b0
> > > > [162003.283044] mem_dump_obj(ffff965e01a58000): slab rcuscale start
> > > > ffff965e01a58000 pointer offset 0 allocated at
> > > > rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.284138]     kmem_cache_alloc+0x19d/0x1b0
> > > > [162003.284522]     rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.284918]     torture_shutdown+0xce/0x1c0
> > > > [162003.285285]     kthread+0xb7/0xe0
> > > > [162003.285587]     ret_from_fork+0x22/0x30
> > > > [162003.285926] mem_dump_obj(ffff965e01a58008): slab rcuscale start
> > > > ffff965e01a58000 pointer offset 8 allocated at
> > > > rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.287045]     kmem_cache_alloc+0x19d/0x1b0
> > > > [162003.287436]     rcu_torture_cleanup+0x432/0x4e0
> > > > [162003.287835]     torture_shutdown+0xce/0x1c0
> > > > [162003.288202]     kthread+0xb7/0xe0
> > > > [162003.288510]     ret_from_fork+0x22/0x30
> > > > [162003.288848] mem_dump_obj(ffffffffa676d9a0): non-slab/vmalloc memory
> > > > [162003.289426] mem_dump_obj() kmalloc test: rcu_torture_stats =
> > > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffff965e144447e0
> > > > [162003.290421] mem_dump_obj(kmalloc ffff965e144447e0): slab
> > > > kmalloc-16 start ffff965e144447e0 pointer offset 0 size 16
> > > > [162003.291316] mem_dump_obj(kmalloc ffff965e144447e8): slab
> > > > kmalloc-16 start ffff965e144447e0 pointer offset 8 size 16
> > > > [162003.292211] mem_dump_obj() vmalloc test: rcu_torture_stats =
> > > > 0000000000000000, &rhp = ffffa0aa0035feb8, rhp = ffffa0aa0002d000
> > > > [162003.293186] mem_dump_obj(vmalloc ffffa0aa0002d000): 1-page vmalloc
> > > > region starting at 0xffffa0aa0002d000 allocated at
> > > > rcu_torture_cleanup.cold+0x287/0x3ef
> > > > [162003.294372] mem_dump_obj(vmalloc ffffa0aa0002d008): 1-page vmalloc
> > > > region starting at 0xffffa0aa0002d000 allocated at
> > > > rcu_torture_cleanup.cold+0x287/0x3ef
> > > > [162003.295548] rcu-torture: rtc: 0000000000000000 VER: 4371112 tfle:
> > > > 0 rta: 4371112 rtaf: 0 rtf: 4371103 rtmbe: 0 rtmbkf: 0/1801158 rtbe: 0
> > > > rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 206435148 onoff:
> > > > 189317/189317:189321/189321 1,1877:1,3248 25561605:45469848 (HZ=1000)
> > > > barrier: 769073/769073:0 read-exits: 185104 nocb-toggles: 0:0
> > > > [162003.297956] rcu-torture: Reader Pipe:  338936620402 7452360 0 0 0
> > > > 0 0 0 0 0 0
> > > > [162003.298578] rcu-torture: Reader Batch:  338929088717 14984045 0 0
> > > > 0 0 0 0 0 0 0
> > > > [162003.299208] rcu-torture: Free-Block Circulation:  4371111 4371111
> > > > 4371110 4371109 4371108 4371107 4371106 4371105 4371104 4371103 0
> > > > [162003.300220] rcu-torture:--- End of test: SUCCESS: nreaders=7
> > > > nfakewriters=4 stat_interval=15 verbose=1 test_no_idle_hz=1
> > > > shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0
> > > > fqs_stutter=3 test_boost=1/0 test_boost_interval=7
> > > > test_boost_duration=4 shutdown_secs=162000 stall_cpu=0
> > > > stall_cpu_holdoff=10 stall_cpu_irqsoff=0 stall_cpu_block=0
> > > > n_barrier_cbs=4 onoff_interval=200 onoff_holdoff=30 read_exit_delay=13
> > > > read_exit_burst=16 nocbs_nthreads=0 nocbs_toggle=1000
> > > > [162003.304162] i2c i2c-0: shutdown
> > > > [162003.304461] alarmtimer alarmtimer.0.auto: shutdown
> > > > [162003.304878] psmouse serio1: shutdown
> > > > ...
> > > > "
> > >
> > > Huh.  Shutdown writer stall on the first one, but I am not seeing the
> > > but in the second one.  Was the "Call Trace" located somewhere else in
> > > the console.log file?
> > >
> > > I have not seen the shutdown writer stall lately, but such stalls have
> > > flagged many sorts of bugs for me to fix in the past.
> > >
> > >
> > > So maybe there is yet another bug that my testing is for whatever reason
> > > not seeing.
> > 
> > Sorry to reply late, I just came back from a long holiday.
> 
> Given my delay, I guess that I cannot complain...
> 

Thanks for insight to the log. I am a little astray in the whole torture
test. I will pay more time to study it.

> > The attachment is the complete log of TREE04.43.
> 
> This happens because rcutorture hit OOM while doing RCU callback flood
> testing.  This -might- be related to the changes, for example if timers
> slowed down, thus slowing down RCU grace periods, thus allowing the
> kthread doing the flooding exhaust memory.
> 
> Grace periods are progressing in the histogram:
> 
> [12077.630343] rcu_torture_fwd_cb_hist: Callback-invocation histogram 0 (duration 1321 jiffies): 1s/10: 75043:17 2s/10: 137085:35 3s/10: 143871:14 4s/10: 66038:26 5s/10: 51754:18 6s/10: 67000:16 7s/10: 54155:17 8s/10: 56749:23 9s/10: 120850:8 10s/10: 87789:17 11s/10: 111123:32 12s/10: 136322:50 13s/10: 108813:33 14s/10: 138489:1
> [12077.632824] rcu: rcu_fwd_progress_check: GP age 17 jiffies
> 
> And there is this:
> 
> [12077.644478] rcu: RCU callbacks invoked since boot: 102942584
> [12077.644974] rcu: rcu_fwd_progress_check: callbacks 0: 45 4: 7973717
> [12077.695770] rcutorture_oom_notify: Freed 592481 RCU callbacks.
> 
> So another question is "how much memory was available to rcutorture?"
> 
> Less than 512MB could explain this.
> 

Each machine has 262144 MB RAM, and runs 32 TREE04 instance. So each
instance has nearly 8192 MB RAM. (I do not run any other load)


Thanks,

	Pingfan

> (I do see this from time to time, but on single-CPU TREE09 runs.)
> 
> 							Thanx, Paul



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux