On Wed, Aug 28, 2019 at 02:12:26PM -0700, Paul E. McKenney wrote: > On Tue, Aug 27, 2019 at 03:01:55PM -0400, Joel Fernandes (Google) wrote: > > This test runs kfree_rcu() in a loop to measure performance of the new > > kfree_rcu() batching functionality. > > > > The following table shows results when booting with arguments: > > rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_rcu_test=1 > > > > In addition, rcuperf.kfree_no_batch is used to toggle the batching of > > kfree_rcu()s for a test run. > > > > patch applied GPs time (seconds) > > yes 1732 14.5 > > no 9133 11.5 > > This is really "rcuperf.kfree_no_batch" rather than "patch applied", right? > (Yes, we did discuss this last time around, but this table combined with > the prior paragraph is still ambiguous.) Please make it unambiguous. > One way to do that is as follows: > > ------------------------------------------------------------------------ > > The following table shows results when booting with arguments: > rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_rcu_test=1 rcuperf.kfree_no_batch=X > > rcuperf.kfree_no_batch=X # Grace Periods Test Duration (s) > X=1 (old behavior) 9133 11.5 > X=0 (new behavior) 1732 14.5 Yes you are right, will fix. The reason I changed it to 'patch applied' is because the last patch in the series removes kfree_no_batch. Will fix! thanks! > > On a 16 CPU system with the above boot parameters, we see that the total > > number of grace periods that elapse during the test drops from 9133 when > > not batching to 1732 when batching (a 5X improvement). The kfree_rcu() > > flood itself slows down a bit when batching, though, as shown. > > This last sentence would be more clear as something like: "However, > use of batching increases the duration of the kfree_rcu()-flood test." > > > Note that the active memory consumption during the kfree_rcu() flood > > does increase to around 200-250MB due to the batching (from around 50MB > > without batching). However, this memory consumption is relatively > > constant. In other words, the system is able to keep up with the > > kfree_rcu() load. The memory consumption comes down considerably if > > KFREE_DRAIN_JIFFIES is increased from HZ/50 to HZ/80. > > That would be a decrease rather than an increase in KFREE_DRAIN_JIFFIES, > correct? > > This would also be a good place to mention that a later patch will > decrease consumption, but that is strictly optional. However, you did > introduce the topic of changing KFREE_DRAIN_JIFFIES, so if a later patch > changes this value, this would be an excellent place to mention this. Fixed. [snip] > > +/* > > + * kfree_rcu() performance tests: Start a kfree_rcu() loop on all CPUs for number > > + * of iterations and measure total time and number of GP for all iterations to complete. > > + */ > > + > > +torture_param(int, kfree_nthreads, -1, "Number of threads running loops of kfree_rcu()."); > > +torture_param(int, kfree_alloc_num, 8000, "Number of allocations and frees done in an iteration."); > > +torture_param(int, kfree_loops, 10, "Number of loops doing kfree_alloc_num allocations and frees."); > > +torture_param(int, kfree_no_batch, 0, "Use the non-batching (slower) version of kfree_rcu()."); > > + > > +static struct task_struct **kfree_reader_tasks; > > +static int kfree_nrealthreads; > > +static atomic_t n_kfree_perf_thread_started; > > +static atomic_t n_kfree_perf_thread_ended; > > + > > +struct kfree_obj { > > + char kfree_obj[8]; > > + struct rcu_head rh; > > +}; > > + > > +static int > > +kfree_perf_thread(void *arg) > > +{ > > + int i, loop = 0; > > + long me = (long)arg; > > + struct kfree_obj *alloc_ptr; > > + u64 start_time, end_time; > > + > > + VERBOSE_PERFOUT_STRING("kfree_perf_thread task started"); > > + set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); > > + set_user_nice(current, MAX_NICE); > > + > > + start_time = ktime_get_mono_fast_ns(); > > + > > + if (atomic_inc_return(&n_kfree_perf_thread_started) >= kfree_nrealthreads) { > > + if (gp_exp) > > + b_rcu_gp_test_started = cur_ops->exp_completed() / 2; > > At some point, it would be good to use the new grace-period > sequence-counter functions (rcuperf_seq_diff(), for example) instead of > the open-coded division by 2. I freely admit that you are just copying > my obsolete hack in this case, so not needed in this patch. But I am using rcu_seq_diff() below in the pr_alert(). Anyway, I agree this can be a follow-on since this pattern is borrowed from another part of rcuperf. However, I am also confused about the pattern itself. If I understand, you are doing the "/ 2" because expedited_sequence progresses by 2 for every expedited batch. But does rcu_seq_diff() really work on these expedited GP numbers, and will it be immune to changes in RCU_SEQ_STATE_MASK? Sorry for the silly questions, but admittedly I have not looked too much yet into expedited RCU so I could be missing the point. > > + else > > + b_rcu_gp_test_finished = cur_ops->get_gp_seq(); > > + > > + pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld\n", > > + (unsigned long long)(end_time - start_time), kfree_loops, > > + rcuperf_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started)); > > + if (shutdown) { > > + smp_mb(); /* Assign before wake. */ > > + wake_up(&shutdown_wq); > > + } > > + } > > + > > + torture_kthread_stopping("kfree_perf_thread"); > > + return 0; > > +} > > + > > +static void > > +kfree_perf_cleanup(void) > > +{ > > + int i; > > + > > + if (torture_cleanup_begin()) > > + return; > > + > > + if (kfree_reader_tasks) { > > + for (i = 0; i < kfree_nrealthreads; i++) > > + torture_stop_kthread(kfree_perf_thread, > > + kfree_reader_tasks[i]); > > + kfree(kfree_reader_tasks); > > + } > > + > > + torture_cleanup_end(); > > +} > > + > > +/* > > + * shutdown kthread. Just waits to be awakened, then shuts down system. > > + */ > > +static int > > +kfree_perf_shutdown(void *arg) > > +{ > > + do { > > + wait_event(shutdown_wq, > > + atomic_read(&n_kfree_perf_thread_ended) >= > > + kfree_nrealthreads); > > + } while (atomic_read(&n_kfree_perf_thread_ended) < kfree_nrealthreads); > > + > > + smp_mb(); /* Wake before output. */ > > + > > + kfree_perf_cleanup(); > > + kernel_power_off(); > > + return -EINVAL; > > These last four lines should be combined with those of > rcu_perf_shutdown(). Actually, you could fold the two functions together > with only a pair of arguments and two one-line wrapper functions, which > would be even better. But the cleanup() function is different in the 2 cases and will have to be passed in as a function pointer. I believe we discussed this last review as well. thanks, - Joel