On Sun, 31 May 2020 18:18:38 -0700, Paul E. McKenney wrote: > On Mon, Jun 01, 2020 at 08:11:06AM +0900, Akira Yokosawa wrote: >> On Sun, 31 May 2020 09:50:23 -0700, Paul E. McKenney wrote: >>> On Sun, May 31, 2020 at 09:30:44AM +0900, Akira Yokosawa wrote: >>>> Hi Paul, >>>> >>>> This is misc updates in response to your recent updates. >>>> >>>> Patch 1/3 treats QQZ annotations for "nq" build. >>> >>> Good reminder, thank you! >>> >>>> Patch 2/3 adds a paragraph in #9 of FAQ.txt. The wording may need >>>> your retouch for fluency. >>>> Patch 3/3 is an independent improvement of runlatex.sh. It will avoid >>>> a few redundant runs of pdflatex when you have some typo in labels/refs. >>> >>> Nice, queued and pushed, thank you! >>> >>>> Another suggestion to Figures 9.25 and 9.29. >>>> Wouldn't these graphs look better with log scale x-axis? >>>> >>>> X range can be 0.001 -- 10. >>>> >>>> You'll need to add a few data points in sub-microsecond critical-section >>>> duration to show plausible shapes in those regions, though. >>> >>> I took a quick look and didn't find any nanosecond delay primitives >>> in the Linux kernel, but yes, that would be nicer looking. >>> >>> I don't expect to make further progress on this particular graph >>> in the immediate future, but if you know of such a delay primitive, >>> please don't keep it a secret! ;-) >> >> I find ndelay() defined in include/asm_generic/delay.h. >> I'm not sure if it works as you would expect, though. > > I must be going blind, given that I missed that one! :-) :-) > > I did try it out, and it suffers from about 10% timing errors. In > contrast, udelay is usually less than 1%. You mean udelay(1)'s error is less than 10ns, whereas ndelay(1000)'s error is about 100ns? Looking at the definition of __udelay() and __ndelay() in arch/x86/lib/delay.c, the constant 0x10c7 has much effective bits than 0x0005. This is likely the cause of difference in errors. > But how about as shown below? > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 7d9ab703b0a33ff5f8db330f0bac3dde9deead07 > Author: Paul E. McKenney <paulmck@xxxxxxxxxx> > Date: Sun May 31 18:14:57 2020 -0700 > > refperf: Change readdelay module parameter to nanoseconds > > The current units of microseconds are too coarse, so this commit > changes the units to nanoseconds. However, ndelay is used only for the > nanoseconds with udelay being used for whole microseconds. For example, > setting refperf.readdelay=1500 results in an ndelay(500) followed by > a udelay(1). Your code below looks opposite, udelay(1) + ndelay(500), doesn't it? > > Suggested-by: Akira Yokosawa <akiyks@xxxxxxxxx> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > diff --git a/kernel/rcu/refperf.c b/kernel/rcu/refperf.c > index 3b72925..96f8ba0 100644 > --- a/kernel/rcu/refperf.c > +++ b/kernel/rcu/refperf.c > @@ -66,8 +66,8 @@ torture_param(long, loops, 10000, "Number of loops per experiment."); > torture_param(int, nreaders, -1, "Number of loops per experiment."); > // Number of runs. > torture_param(int, nruns, 30, "Number of experiments to run."); > -// Reader delay in microseconds, 0 for no delay. > -torture_param(int, readdelay, 0, "Read-side delay in microseconds."); > +// Reader delay in nanoseconds, 0 for no delay. > +torture_param(int, readdelay, 0, "Read-side delay in nanoseconds."); > > #ifdef MODULE > # define REFPERF_SHUTDOWN 0 > @@ -111,7 +111,8 @@ struct ref_perf_ops { > void (*init)(void); > void (*cleanup)(void); > void (*readsection)(const int nloops); > - void (*delaysection)(const int nloops, const int ndelay); > + void (*delaysection)(const int nloops, > + const int udelay, const int ndelay); > const char *name; > }; > > @@ -127,13 +128,17 @@ static void ref_rcu_read_section(const int nloops) > } > } > > -static void ref_rcu_delay_section(const int nloops, const int ndelay) > +static void > +ref_rcu_delay_section(const int nloops, const int udelay, const int ndelay) > { > int i; > > for (i = nloops; i >= 0; i--) { > rcu_read_lock(); > - udelay(ndelay); > + if (udelay) > + udelay(udelay); > + if (ndelay) > + ndelay(ndelay); > rcu_read_unlock(); > } > } > @@ -165,14 +170,18 @@ static void srcu_ref_perf_read_section(const int nloops) > } > } > > -static void srcu_ref_perf_delay_section(const int nloops, const int ndelay) > +static void srcu_ref_perf_delay_section(const int nloops, > + const int udelay, const int ndelay) > { > int i; > int idx; > > for (i = nloops; i >= 0; i--) { > idx = srcu_read_lock(srcu_ctlp); > - udelay(ndelay); > + if (udelay) > + udelay(udelay); > + if (ndelay) > + ndelay(ndelay); > srcu_read_unlock(srcu_ctlp, idx); > } > } > @@ -197,13 +206,17 @@ static void ref_refcnt_section(const int nloops) > } > } > > -static void ref_refcnt_delay_section(const int nloops, const int ndelay) > +static void > +ref_refcnt_delay_section(const int nloops, const int udelay, const int ndelay) > { > int i; > > for (i = nloops; i >= 0; i--) { > atomic_inc(&refcnt); > - udelay(ndelay); > + if (udelay) > + udelay(udelay); > + if (ndelay) > + ndelay(ndelay); > atomic_dec(&refcnt); > } > } > @@ -233,13 +246,17 @@ static void ref_rwlock_section(const int nloops) > } > } > > -static void ref_rwlock_delay_section(const int nloops, const int ndelay) > +static void > +ref_rwlock_delay_section(const int nloops, const int udelay, const int ndelay) > { > int i; > > for (i = nloops; i >= 0; i--) { > read_lock(&test_rwlock); > - udelay(ndelay); > + if (udelay) > + udelay(udelay); > + if (ndelay) > + ndelay(ndelay); > read_unlock(&test_rwlock); > } > } > @@ -269,13 +286,17 @@ static void ref_rwsem_section(const int nloops) > } > } > > -static void ref_rwsem_delay_section(const int nloops, const int ndelay) > +static void > +ref_rwsem_delay_section(const int nloops, const int udelay, const int ndelay) > { > int i; > > for (i = nloops; i >= 0; i--) { > down_read(&test_rwsem); > - udelay(ndelay); > + if (udelay) > + udelay(udelay); > + if (ndelay) > + ndelay(ndelay); Maybe defining a helper function/macro for this common pattern in rcu.h can ease maintenance cost. Say undelay(udl, ndl)? Thanks, Akira > up_read(&test_rwsem); > } > } > @@ -292,7 +313,8 @@ static void rcu_perf_one_reader(void) > if (readdelay <= 0) > cur_ops->readsection(loops); > else > - cur_ops->delaysection(loops, readdelay); > + cur_ops->delaysection(loops, > + readdelay / 1000, readdelay % 1000); > } > > // Reader kthread. Repeatedly does empty RCU read-side >