Re: [PATCH v5 00/18] Implement call_rcu_lazy() and miscellaneous fixes

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Tue, 20 Sep 2022 12:03:06 -0700

On Tue, Sep 20, 2022 at 05:55:53PM +0000, Joel Fernandes wrote:
> Sorry for slightly delayed reply, today was crazy at work.

I know that feeling!

> On Tue, Sep 20, 2022 at 05:32:08AM -0700, Paul E. McKenney wrote:
> > On Tue, Sep 20, 2022 at 07:40:01AM -0400, Joel Fernandes wrote:
> > > +rcu@vger for archives.
> > > On Tue, Sep 20, 2022 at 7:37 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > > > > On 9/19/2022 7:40 PM, Joel Fernandes wrote:
> > > > >> On 9/19/2022 6:32 PM, Paul E. McKenney wrote:
> > > > >>> On Mon, Sep 19, 2022 at 05:38:19PM -0400, Joel Fernandes wrote:
> > > > >>>> On Mon, Sep 19, 2022 at 9:11 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > > >>>>>
> > > > >>>>> On Mon, Sep 19, 2022 at 10:08:28AM +0200, Uladzislau Rezki wrote:
> > > > >>>>>> On Wed, Sep 14, 2022 at 04:22:51AM -0700, Paul E. McKenney wrote:
> > > > >>>>>>> On Wed, Sep 14, 2022 at 11:38:32AM +0100, Uladzislau Rezki wrote:
> > > > >>>>>>>> On Wed, Sep 14, 2022, 11:23 Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> On Wed, Sep 14, 2022 at 02:34:23AM -0700, Paul E. McKenney wrote:
> > > > >>>>>>>>>> Agreed.  There could be a list of functions in call_rcu() that are
> > > > >>>>>>>>>> treated specially.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> But please be careful.  Maintainers of a given function might have
> > > > >>>>>>>>>> something to say about that function being added to that list.  ;-)
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> But one big advantage of this is that the list of functions could be
> > > > >>>>>>>>>> private to ChromeOS and/or Android.  Maintainers might still be annoyed
> > > > >>>>>>>>>> about bugs sent to them whose root cause was addition of their function
> > > > >>>>>>>>>> to the call_rcu() list, though.  ;-)
> > > > >>>>>>>>>
> > > > >>>>>>>>> Are you sure you want this feature to be usable and debuggable by Android
> > > > >>>>>>>>> only?
> > > > >>>>>>>>>
> > > > >>>>>>>> Adding on top of Frederic sentence. Sorry for that 😀
> > > > >>>>>>>>
> > > > >>>>>>>> Do you mean to have the function names internally to be able to detect what
> > > > >>>>>>>> is called and based on that func list place them in a lazy track?
> > > > >>>>>>>
> > > > >>>>>>> Yes, if needed.  But in addition to using call_rcu_lazy() where the
> > > > >>>>>>> callback functions' maintainers are good with it.
> > > > >>>>>>>
> > > > >>>>>>> Of course, if all relevant maintainers are OK with all the callback
> > > > >>>>>>> functions being passed to call_rcu_lazy(), then their is no need for
> > > > >>>>>>> the internal switching.
> > > > >>>>>>>
> > > > >>>>>> Then we would need map function names internally. I think what we got as
> > > > >>>>>> a feedback from the conference session is it makes sense to improve the
> > > > >>>>>> batching in general of call_rcu(). Doing it we will improve and cover
> > > > >>>>>> all current users. So we do not need to distinguish lazy variant and a
> > > > >>>>>> normal one from each other. Also we do not need to explain people the
> > > > >>>>>> difference and switch to _lazy() across the kernel code. As we also
> > > > >>>>>> discussed one regular non-lazy call will revert everything back to a
> > > > >>>>>> problem that is in question.
> > > > >>>>>>
> > > > >>>>>> As Thomas said, look at it as: delay of invocation and if needed in
> > > > >>>>>> corner case do kind of expedited or sync support. For example call_rcu_sync().
> > > > >>>>>>
> > > > >>>>>> It is not up to me but... I just share my view :)
> > > > >>>>>
> > > > >>>>> Me, I am a bit pessimistic about the probability of us getting away with
> > > > >>>>> making the file-close RCU callback be lazy.
> > > > >>>>>
> > > > >>>>> After all, this thing is used across all filesystems along with a great
> > > > >>>>> many things that are not filesystems.  It doesn't really matter whether
> > > > >>>>> this laziness comes from a conversion to call_rcu_lazy() as in Joel's most
> > > > >>>>> recent patch, lack of a conversion to call_rcu_expedited() as in Thomas
> > > > >>>>> Gleixner's suggestion, or a mapping within call_rcu() as I suggested.
> > > > >>>>>
> > > > >>>>> Don't get me wrong, it would be great if this callback really can be
> > > > >>>>> always lazy, non-expedited, or whatever, but we should not take this
> > > > >>>>> for granted.
> > > > >>>>
> > > > >>>> Are you saying we should not make call_rcu() default-lazy for
> > > > >>>> everyone? According to Thomas, anyone expecting call_rcu() to finish
> > > > >>>> in any reasonable amount of time is a bug, and that usecase needs to
> > > > >>>> be fixed to use synchronize_rcu(). Are we saying now that that's not
> > > > >>>> our direction?
> > > > >>>
> > > > >>> I am saying that Thomas can sometimes get quite enthusiastic about
> > > > >>> a particular direction.  Plus I strongly suspect that Thomas has not
> > > > >>> actually reviewed all of the RCU callbacks.
> > > > >>>
> > > > >>> Of course, I agree that it won't hurt to try out Thomas's suggestion,
> > > > >>> which is why I did not object at the time.  But we should be fully
> > > > >>> prepared for it to blow up in our faces.  ;-)
> > > > >>
> > > > >> Makes sense, thanks for clarifying. My impression was he does expect some pain
> > > > >> in the short-term but the pain is worth not introduction a new opt-in API.
> > > > >>
> > > > >> I am wondering if putting it behind a CONFIG option will delay the pain too
> > > > >> much. Thoughts? I am assuming that we want the CONFIG to turn it off by default.
> > > > >
> > > > > I tried a poor man's version of Thomas's idea by doing
> > > > > jiffies_till_first_fqs=1000 and rcu.rcu_expedited=1 boot parameters. Of course
> > > > > this just for trying has no regard for memory pressure and such so it is in the
> > > > > 'do not try at home' category.
> > > > >
> > > > > Then I tried jiffies_till_first_fqs=128 and compare before/after kernel logs.
> > > > >
> > > > > These are on ChromeOS hardware (5.19-rc4 kernel)
> > > > > For jiffies_till_first_fqs=1000 Everything is fine until this point:
> > > > >     8.383646] Freeing unused kernel image (initmem) memory: 1676K
> > > > >    11.378880] Write protecting the kernel read-only data: 28672k
> > > > >
> > > > > I do not see this 3 second delay with jiffies_till_first_fqs=128.
> > > > >     8.411866] Freeing unused kernel image (initmem) memory: 1676K
> > > > >     8.656701] Write protecting the kernel read-only data: 28672k
> > > >
> > > > At least this could be because of the rodata write protection stuff calling
> > > > rcu_barrier():
> > > >
> > > > From init/main.c
> > > >
> > > > #ifdef CONFIG_STRICT_KERNEL_RWX
> > > > static void mark_readonly(void)
> > > > {
> > > >         if (rodata_enabled) {
> > > >                 /*
> > > >                  * load_module() results in W+X mappings, which are cleaned
> > > >                  * up with call_rcu().  Let's make sure that queued work is
> > > >                  * flushed so that we don't hit false positives looking for
> > > >                  * insecure pages which are W+X.
> > > >                  */
> > > >                 rcu_barrier();
> > > >                 mark_rodata_ro();
> > > >                 rodata_test();
> > > >         } else
> > > >                 pr_info("Kernel memory protection disabled.\n");
> > > > }
> > > >
> > > >
> > > > I think rcu_barrier() does not play well with jiffiest_till_first_fqs (probably
> > > > because of the missing wake ups I found in the existing code, I should try that
> > > > patch on this test..).
> > > >
> > > > Is it expected that a sleep involving jiffies_till_first_fqs does not get
> > > > disturbed due to rcu_barrier() ? That seems like a bug to me. At least for
> > > > !NOCB, I don't see any call the rcu_gp_kthread_wake() from rcu_barrier()!
> > 
> > Another approach is to take your current stack of patches, but only those
> > to RCU, and then as Thomas suggests, make call_rcu() behave lazily and
> > make a call_rcu_expedited() that retains the current behavior.  Keep the
> > current grace-period timings.
> 
> Yes these sound good. But one issue is, we do not know all the cases that
> needed call_rcu_expedited(). There is the per-cpu refcount which needs it,
> but I'm not aware of any others. If you take the patches, can we add convert
> to call_rcu_expedited() as regressions happen?

Exactly.

Several of the calls to wait_rcu_gp() would need their arguments to
change from call_rcu() to call_rcu_expedited().  Oddly enough, including
the call from synchronize_rcu_expedited() in the case where there are
to be no expedited grace periods.  ;-)

> > Then make synchronize_rcu() and friends use call_rcu_expedited() and see
> > if you can get away with only a small number of call_rcu_expedited()
> > "upgrades".
> 
> Right. It is a bit of a chicken-and-egg problem, if we have to whitelist
> everything, then we have to blacklist as we go. If we blacklist everything,
> then we may miss whitelisting something.
> 
> Thoughts?

Try it and see what happens.  We can always fall back to call_rcu_lazy().

> Meanwhile just to clarify- I am very much invested in seeing the current
> stack of patches merged soon give or take the _expedited() conversions,
> however I did the above jiffies_till_first experiments in the hopes they
> would show the ones that need such conversion quickly.

I suggest doing the experiment on top of your patch stack, but excluding
the call_rcu_lazy() conversios.  That way, as we come to agreement on
pieces we can pull the patches in, so that the experiments do not get
in the way.

							Thanx, Paul