Re: [PATCHv2 3/3] rcu: coordinate tick dependency during concurrent offlining

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 26, 2022 at 03:23:52PM -0700, Paul E. McKenney wrote:
> On Mon, Sep 26, 2022 at 02:34:17PM +0800, Pingfan Liu wrote:
> > Sorry to reply late. I just realize this e-mail misses in my gmail.
> > 
> > On Thu, Sep 22, 2022 at 06:54:42AM -0700, Paul E. McKenney wrote:
> > [...]
> > > 
> > > If you have tools/.../rcutorture/bin on your path, yes.  This would default
> > > to a 30-minute run.  If you have at least 16 CPUs, you should add
> >                                             ^^^ TREE04 has CONFIG_NR_CPUS=8, so I think here the num is 8
> 
> Yes, you will get some benefit from --allcpus on systems with from 9-15
> CPUs as well as for 16 and more.  At 8 CPUs, it wouldn't matter.
> 
> > > "--allcpus" to do concurrrent runs.  For example, given 64 CPUs you could
> > > do this:
> > > 
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10h --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "4*TREE04"
> > > 
> > 
> > I have tried to find a two socket system with 128 cpus and run
> >   sh kvm.sh --allcpus --duration 250h --bootargs rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 --configs 16*TREE04
> > 
> > Where 250*16=4000
> 
> That would work.
> 

This job has successfully run 24+ hours. (But maybe I can only keep it
about 180 hours)

> > > This would run four concurrent instances of the TREE04 scenario, each for
> > > 10 hours, for a total of 40 hours of test time.
> > > 
> > > > > It does take some time to run.  I did 4,000 hours worth of TREE04
> > > >                                         ^^^ '--duration=4000h' can serve this purpose?
> > > 
> > > You could, at least if you replace the "=" with a space character, but
> > > that really would run a six-month test, which is probably not what you
> > > want to do.  There being 8,760 hours in a year and all that.
> > > 
> > > > Is it related with the cpu's freq?
> > > 
> > > Not at all.  '--duration 10h' would run ten hours of wall-clock time
> > > regardless of the CPU frequencies.
> > > 
> > > > > to confirm lack of bug.  But an 80-CPU dual-socket system can run
> > > > > 10 concurrent instances of TREE04, which gets things down to a more
> > > > 
> > > > The total demanded hours H = 4000/(system_cpu_num/8)?
> > > 
> > > Yes.  You can also use multiple systems, which is what kvm-remote.sh is
> > > intended for, again assuming 80 CPUs per system to keep the arithmetic
> > > simple:
> > > 
> > > tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 ... sys20" --duration 20h --cpus 80 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "200*TREE04"
> > > 
> > 
> > That is appealing.
> > 
> > I will see if any opportunity to grasp a batch of machines to run the
> > test.
> 
> Initial tests with smaller numbers of CPUs are also useful, for example,
> in case reversion causes some bug due to bad interaction with a later
> commit.
> 
> Please let me know how it goes!
> 

I have managed to grasp three two-socket machine, each has 256 cpus.
The test has run about 7 hours till now without any problem by the following command:
tools/testing/selftests/rcutorture/bin/kvm-remote.sh "sys1 sys2 sys3" \
--duration 45h --cpus 256 --bootargs "rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30" --configs "96*TREE04"

It seems promising.


Thanks,

	Pingfan



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux