Re: [PATCH v5 00/18] Implement call_rcu_lazy() and miscellaneous fixes

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Tue, 20 Sep 2022 07:40:01 -0400



+rcu@vger for archives.


On Tue, Sep 20, 2022 at 7:37 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>
>
>
> On 9/20/2022 6:29 AM, Joel Fernandes wrote:
> >
> >
> > On 9/19/2022 7:40 PM, Joel Fernandes wrote:
> >>
> >>
> >> On 9/19/2022 6:32 PM, Paul E. McKenney wrote:
> >>> On Mon, Sep 19, 2022 at 05:38:19PM -0400, Joel Fernandes wrote:
> >>>> On Mon, Sep 19, 2022 at 9:11 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >>>>>
> >>>>> On Mon, Sep 19, 2022 at 10:08:28AM +0200, Uladzislau Rezki wrote:
> >>>>>> On Wed, Sep 14, 2022 at 04:22:51AM -0700, Paul E. McKenney wrote:
> >>>>>>> On Wed, Sep 14, 2022 at 11:38:32AM +0100, Uladzislau Rezki wrote:
> >>>>>>>> On Wed, Sep 14, 2022, 11:23 Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
> >>>>>>>>
> >>>>>>>>> On Wed, Sep 14, 2022 at 02:34:23AM -0700, Paul E. McKenney wrote:
> >>>>>>>>>> Agreed.  There could be a list of functions in call_rcu() that are
> >>>>>>>>>> treated specially.
> >>>>>>>>>>
> >>>>>>>>>> But please be careful.  Maintainers of a given function might have
> >>>>>>>>>> something to say about that function being added to that list.  ;-)
> >>>>>>>>>>
> >>>>>>>>>> But one big advantage of this is that the list of functions could be
> >>>>>>>>>> private to ChromeOS and/or Android.  Maintainers might still be annoyed
> >>>>>>>>>> about bugs sent to them whose root cause was addition of their function
> >>>>>>>>>> to the call_rcu() list, though.  ;-)
> >>>>>>>>>
> >>>>>>>>> Are you sure you want this feature to be usable and debuggable by Android
> >>>>>>>>> only?
> >>>>>>>>>
> >>>>>>>> Adding on top of Frederic sentence. Sorry for that 😀
> >>>>>>>>
> >>>>>>>> Do you mean to have the function names internally to be able to detect what
> >>>>>>>> is called and based on that func list place them in a lazy track?
> >>>>>>>
> >>>>>>> Yes, if needed.  But in addition to using call_rcu_lazy() where the
> >>>>>>> callback functions' maintainers are good with it.
> >>>>>>>
> >>>>>>> Of course, if all relevant maintainers are OK with all the callback
> >>>>>>> functions being passed to call_rcu_lazy(), then their is no need for
> >>>>>>> the internal switching.
> >>>>>>>
> >>>>>> Then we would need map function names internally. I think what we got as
> >>>>>> a feedback from the conference session is it makes sense to improve the
> >>>>>> batching in general of call_rcu(). Doing it we will improve and cover
> >>>>>> all current users. So we do not need to distinguish lazy variant and a
> >>>>>> normal one from each other. Also we do not need to explain people the
> >>>>>> difference and switch to _lazy() across the kernel code. As we also
> >>>>>> discussed one regular non-lazy call will revert everything back to a
> >>>>>> problem that is in question.
> >>>>>>
> >>>>>> As Thomas said, look at it as: delay of invocation and if needed in
> >>>>>> corner case do kind of expedited or sync support. For example call_rcu_sync().
> >>>>>>
> >>>>>> It is not up to me but... I just share my view :)
> >>>>>
> >>>>> Me, I am a bit pessimistic about the probability of us getting away with
> >>>>> making the file-close RCU callback be lazy.
> >>>>>
> >>>>> After all, this thing is used across all filesystems along with a great
> >>>>> many things that are not filesystems.  It doesn't really matter whether
> >>>>> this laziness comes from a conversion to call_rcu_lazy() as in Joel's most
> >>>>> recent patch, lack of a conversion to call_rcu_expedited() as in Thomas
> >>>>> Gleixner's suggestion, or a mapping within call_rcu() as I suggested.
> >>>>>
> >>>>> Don't get me wrong, it would be great if this callback really can be
> >>>>> always lazy, non-expedited, or whatever, but we should not take this
> >>>>> for granted.
> >>>>
> >>>> Are you saying we should not make call_rcu() default-lazy for
> >>>> everyone? According to Thomas, anyone expecting call_rcu() to finish
> >>>> in any reasonable amount of time is a bug, and that usecase needs to
> >>>> be fixed to use synchronize_rcu(). Are we saying now that that's not
> >>>> our direction?
> >>>
> >>> I am saying that Thomas can sometimes get quite enthusiastic about
> >>> a particular direction.  Plus I strongly suspect that Thomas has not
> >>> actually reviewed all of the RCU callbacks.
> >>>
> >>> Of course, I agree that it won't hurt to try out Thomas's suggestion,
> >>> which is why I did not object at the time.  But we should be fully
> >>> prepared for it to blow up in our faces.  ;-)
> >>
> >> Makes sense, thanks for clarifying. My impression was he does expect some pain
> >> in the short-term but the pain is worth not introduction a new opt-in API.
> >>
> >> I am wondering if putting it behind a CONFIG option will delay the pain too
> >> much. Thoughts? I am assuming that we want the CONFIG to turn it off by default.
> >
> > I tried a poor man's version of Thomas's idea by doing
> > jiffies_till_first_fqs=1000 and rcu.rcu_expedited=1 boot parameters. Of course
> > this just for trying has no regard for memory pressure and such so it is in the
> > 'do not try at home' category.
> >
> > Then I tried jiffies_till_first_fqs=128 and compare before/after kernel logs.
> >
> > These are on ChromeOS hardware (5.19-rc4 kernel)
> > For jiffies_till_first_fqs=1000 Everything is fine until this point:
> >     8.383646] Freeing unused kernel image (initmem) memory: 1676K
> >    11.378880] Write protecting the kernel read-only data: 28672k
> >
> > I do not see this 3 second delay with jiffies_till_first_fqs=128.
> >     8.411866] Freeing unused kernel image (initmem) memory: 1676K
> >     8.656701] Write protecting the kernel read-only data: 28672k
>
> At least this could be because of the rodata write protection stuff calling
> rcu_barrier():
>
> From init/main.c
>
> #ifdef CONFIG_STRICT_KERNEL_RWX
> static void mark_readonly(void)
> {
>         if (rodata_enabled) {
>                 /*
>                  * load_module() results in W+X mappings, which are cleaned
>                  * up with call_rcu().  Let's make sure that queued work is
>                  * flushed so that we don't hit false positives looking for
>                  * insecure pages which are W+X.
>                  */
>                 rcu_barrier();
>                 mark_rodata_ro();
>                 rodata_test();
>         } else
>                 pr_info("Kernel memory protection disabled.\n");
> }
>
>
> I think rcu_barrier() does not play well with jiffiest_till_first_fqs (probably
> because of the missing wake ups I found in the existing code, I should try that
> patch on this test..).
>
> Is it expected that a sleep involving jiffies_till_first_fqs does not get
> disturbed due to rcu_barrier() ? That seems like a bug to me. At least for
> !NOCB, I don't see any call the rcu_gp_kthread_wake() from rcu_barrier()!
>
> Thanks,
>
>  - Joel
>
> >
> > So I am guessing initmem uses call_rcu() in a 'synchronous' way or something, OR
> > it is doing synchronize_rcu() and rcu_expedited is not working at his point of
> > the boot. But this is just after initramfs filesystem is mounted so it should be
> > expediting.
> >
> > The next big delay I notice is from starting udev.d
> >
> > Multi-second delay with jiffies_till_first_fqs=1000
> > [    9.305123] SafeSetID: insecure policy detected: uid 213 is constrained [..]
> > [   15.499362] udevd[270]: starting version 225
> >
> > But not much delay with  jiffies_till_first_fqs=128
> > [    9.305123] SafeSetID: insecure policy detected: uid 213 is constrained [..]
> > [    9.490461] udevd[269]: starting version 225
> >
> >
> > So there's 2 issues at boot slowing things down right off the bat. :-\
> >
> > I wonder how much more 'fixing' is needed to make Thomas's idea less excruciating.
> >
> > Maybe a new call_rcu_lazy() API is the way to go after all?
> >
> > I'll try to trace the above 2 cases and see what the callbacks are doing.
> >
> > thanks,
> >
> >  - Joel
> >
> >
> >