On Tue, Sep 22, 2020 at 10:41:37AM +1000, Alexey Kardashevskiy wrote: > > > On 17/09/2020 02:12, Paul E. McKenney wrote: > > On Fri, Sep 11, 2020 at 06:52:08AM -0700, Paul E. McKenney wrote: > >> On Fri, Sep 11, 2020 at 03:09:41PM +1000, Alexey Kardashevskiy wrote: > >>> On 11/09/2020 04:53, Paul E. McKenney wrote: > >>>> On Wed, Sep 09, 2020 at 10:31:03PM +1000, Alexey Kardashevskiy wrote: > > > > [ . . . ] > > > >>>>> init_srcu_struct_nodes() assumes ssp->sda!=NULL but alloc_percpu() fails > >>>>> here: > >>>>> > >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/percpu.c#n1734 > >>>>> === > >>>>> } else if (mutex_lock_killable(&pcpu_alloc_mutex)) { > >>>>> pcpu_memcg_post_alloc_hook(objcg, NULL, 0, size); > >>>>> return NULL; > >>>>> === > >>>>> > >>>>> I am still up to reading that osr-rcuusage.pdf to provide better > >>>>> analysis :) Thanks, > >>>> > >>>> Ah, got it! Does the following patch help? > >>>> > >>>> There will likely also need to be changes to cleanup_srcu_struct(), > >>>> but first let's see if I understand the problem. ;-) > >>>> > >>>> Thanx, Paul > >>>> > >>>> ------------------------------------------------------------------------ > >>>> > >>>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > >>>> index c13348e..6f7880a 100644 > >>>> --- a/kernel/rcu/srcutree.c > >>>> +++ b/kernel/rcu/srcutree.c > >>>> @@ -177,11 +177,13 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static) > >>>> INIT_DELAYED_WORK(&ssp->work, process_srcu); > >>>> if (!is_static) > >>>> ssp->sda = alloc_percpu(struct srcu_data); > >>>> + if (!ssp->sda) > >>>> + return -ENOMEM; > >>>> init_srcu_struct_nodes(ssp, is_static); > >>>> ssp->srcu_gp_seq_needed_exp = 0; > >>>> ssp->srcu_last_gp_end = ktime_get_mono_fast_ns(); > >>>> smp_store_release(&ssp->srcu_gp_seq_needed, 0); /* Init done. */ > >>> > >>> The line above confuses me a bit. What you propose returns without > >>> smp_store_release() called which should not matter I suppose. > >> > >> The idea is that if init_srcu_struct() returns -ENOMEM, the structure > >> has not been initialized and had better not be used. If the calling code > >> cannot handle that outcome, then the calling code needs to do something > >> to insulate init_srcu_struct() from signals. One thing that it could > >> do would be to invoke init_srcu_struct() from a workqueue handler and > >> wait for this handler to complete. > >> > >> Please keep in mind that there is nothing init_srcu_struct() can do > >> about this: The srcu_struct is useless unless alloc_percpu() succeeds. > >> > >> And yes, I do need to update the header comments to make this clear. > >> > >>> Otherwise it should work, although I cannot verify right now as my box > >>> went down and since it is across Pacific - it may take time to power > >>> cycle it :) Thanks, > >> > >> I know that feeling! And here is hoping that the box is out of reach > >> of the local hot spots. ;-) > > > > Just following up... Did that patch help? > > Yes it did. > > Tested-by: Alexey Kardashevskiy <aik@xxxxxxxxx> Thank you, and I will apply on the next rebase. Thanx, Paul