On 11/09/2020 04:53, Paul E. McKenney wrote: > On Wed, Sep 09, 2020 at 10:31:03PM +1000, Alexey Kardashevskiy wrote: >> >> >> On 09/09/2020 21:50, Paul E. McKenney wrote: >>> On Wed, Sep 09, 2020 at 07:24:11PM +1000, Alexey Kardashevskiy wrote: >>>> >>>> >>>> On 09/09/2020 00:43, Alexey Kardashevskiy wrote: >>>>> init_srcu_struct_nodes() is called with is_static==true only internally >>>>> and when this happens, the srcu->sda is not initialized in >>>>> init_srcu_struct_fields() and we crash on dereferencing @sdp. >>>>> >>>>> This fixes the crash by moving "if (is_static)" out of the loop which >>>>> only does useful work for is_static=false case anyway. >>>>> >>>>> Found by syzkaller. >>>>> >>>>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> >>>>> --- >>>>> kernel/rcu/srcutree.c | 5 +++-- >>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c >>>>> index c100acf332ed..49b54a50bde8 100644 >>>>> --- a/kernel/rcu/srcutree.c >>>>> +++ b/kernel/rcu/srcutree.c >>>>> @@ -135,6 +135,9 @@ static void init_srcu_struct_nodes(struct srcu_struct *ssp, bool is_static) >>>>> levelspread[level - 1]; >>>>> } >>>>> >>>>> + if (is_static) >>>>> + return; >>>> >>>> Actually, this is needed here too: >>>> >>>> if (!ssp->sda) >>>> return; >>>> >>>> as >>>> ssp->sda = alloc_percpu(struct srcu_data) >>>> >>>> can fail if the process is killed too soon - it is quite easy to get >>>> this situation with syzkaller (syscalls fuzzer) >>>> >>>> Makes sense? >>> >>> Just to make sure that I understand, these failures occur when the task >>> running init_srcu_struct_nodes() is killed, correct? >> >> There are multiple user tasks (8) which open /dev/kvm, do 1 ioctl - >> KVM_CREATE_VM - and terminate, running on 8 vcpus, 8 VMs, crashes every >> 20min or so, less tasks or vcpus - and the problem does not appear. >> >> >>> >>> Or has someone managed to invoke (say) synchronize_srcu() on a >>> dynamically allocated srcu_struct before invoking init_srcu_struct() on >>> that srcu_struct? >> >> Nah, none of that :) >> >> init_srcu_struct_nodes() assumes ssp->sda!=NULL but alloc_percpu() fails >> here: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/percpu.c#n1734 >> === >> } else if (mutex_lock_killable(&pcpu_alloc_mutex)) { >> pcpu_memcg_post_alloc_hook(objcg, NULL, 0, size); >> return NULL; >> === >> >> I am still up to reading that osr-rcuusage.pdf to provide better >> analysis :) Thanks, > > Ah, got it! Does the following patch help? > > There will likely also need to be changes to cleanup_srcu_struct(), > but first let's see if I understand the problem. ;-) > > Thanx, Paul > > ------------------------------------------------------------------------ > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > index c13348e..6f7880a 100644 > --- a/kernel/rcu/srcutree.c > +++ b/kernel/rcu/srcutree.c > @@ -177,11 +177,13 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static) > INIT_DELAYED_WORK(&ssp->work, process_srcu); > if (!is_static) > ssp->sda = alloc_percpu(struct srcu_data); > + if (!ssp->sda) > + return -ENOMEM; > init_srcu_struct_nodes(ssp, is_static); > ssp->srcu_gp_seq_needed_exp = 0; > ssp->srcu_last_gp_end = ktime_get_mono_fast_ns(); > smp_store_release(&ssp->srcu_gp_seq_needed, 0); /* Init done. */ The line above confuses me a bit. What you propose returns without smp_store_release() called which should not matter I suppose. Otherwise it should work, although I cannot verify right now as my box went down and since it is across Pacific - it may take time to power cycle it :) Thanks, > - return ssp->sda ? 0 : -ENOMEM; > + return 0; > } > > #ifdef CONFIG_DEBUG_LOCK_ALLOC > -- Alexey