On 9/10/20 11:51 AM, Paul E. McKenney wrote:
On Thu, Sep 10, 2020 at 11:33:58AM -0700, Alexei Starovoitov wrote:
On 9/9/20 10:27 PM, Paul E. McKenney wrote:
On Wed, Sep 09, 2020 at 02:22:12PM -0700, Paul E. McKenney wrote:
On Wed, Sep 09, 2020 at 02:04:47PM -0700, Paul E. McKenney wrote:
On Wed, Sep 09, 2020 at 12:48:28PM -0700, Alexei Starovoitov wrote:
On Wed, Sep 09, 2020 at 12:39:00PM -0700, Paul E. McKenney wrote:
[ . . . ]
My plan is to try the following:
1. Parameterize the backoff sequence so that RCU Tasks Trace
uses faster rechecking than does RCU Tasks. Experiment as
needed to arrive at a good backoff value.
2. If the tasks-list scan turns out to be a tighter bottleneck
than the backoff waits, look into parallelizing this scan.
(This seems unlikely, but the fact remains that RCU Tasks
Trace must do a bit more work per task than RCU Tasks.)
3. If these two approaches, still don't get the update-side
latency where it needs to be, improvise.
The exact path into mainline will of course depend on how far down this
list I must go, but first to get a solution.
I think there is a case of 4. Nothing is inside rcu_trace critical section.
I would expect single ipi would confirm that.
Unless the task moves, yes. So a single IPI should suffice in the
common case.
And what I am doing now is checking code paths.
And the following diff from a set of three patches gets my average
RCU Tasks Trace grace-period latencies down to about 20 milliseconds,
almost a 50x improvement from earlier today.
These are still quite rough and not yet suited for production use, but
I will be testing. If that goes well, I hope to send a more polished
set of patches by end of day tomorrow, Pacific Time. But if you get a
chance to test them, I would value any feedback that you might have.
These patches do not require hand-tuning, they instead adjust the
behavior according to CONFIG_TASKS_TRACE_RCU_READ_MB, which in turn
adjusts according to CONFIG_PREEMPT_RT. So you should get the desired
latency reductions "out of the box", again, without tuning.
Great. Confirming improvement :)
time ./test_progs -t trampoline_count
#101 trampoline_count:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
real 0m2.897s
user 0m0.128s
sys 0m1.527s
This is without CONFIG_TASKS_TRACE_RCU_READ_MB, of course.
Good to hear, thank you!
or is more required? I can tweak to get more. There is never a free
lunch, though, and in this case the downside of further tweaking would
be greater CPU overhead. Alternatively, I could just as easily tweak
it to be slower, thereby reducing the CPU overhead.
If I don't hear otherwise, I will assume that the current settings
work fine.
Now it looks like that sync rcu_tasks_trace is not slower than
rcu_tasks, so if it would only makes sense to accelerate both at the
same time.
I think for now it's good.
Of course, if people start removing thousands of BPF programs at one go,
I suspect that it will be necessary to provide a bulk-removal operation,
similar to some of the bulk-configuration-change operations provided by
networking. The idea is to have a single RCU Tasks Trace grace period
cover all of the thousands of BPF removal operations.
bulk api won't really work for user space.
There is no good way to coordinate attaching different progs (or the
same prog) to many different places.