Re: GPF from __srcu_read_lock() via drm_minor_acquire()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 16, 2020 at 02:37:30PM -0700, Paul E. McKenney wrote:
> On Wed, Sep 16, 2020 at 01:48:22PM -0700, Nick Desaulniers wrote:
> > Hey Paul and RCU folks,
> > I noticed we have a bug report from 2 users that seem to have similar
> > stack traces in SRCU code;
> > https://github.com/ClangBuiltLinux/linux/issues/1081
> > 
> > Is there a way we should go about starting to debug this?
> 
> Hello, Nick,
> 
> Huh.  It looks like the per-CPU memory referenced by the srcu_struct
> structure's ->sda field is unmapped.  That would certainly leave
> the next __srcu_read_lock() dazed and confused!
> 
> The trapping instruction is the increment instruction that I would
> expect to be there.  The source code is as follows:
> 
> 	idx = READ_ONCE(ssp->srcu_idx) & 0x1;
> 	this_cpu_inc(ssp->sda->srcu_lock_count[idx]);
> 	smp_mb();
> 
> Looking at the assembly:
> 
> 	  1e:	55                   	push   %ebp
> 	  1f:	89 e5                	mov    %esp,%ebp
> 
> The above is function preamble.
> 
> 	  21:	8b 48 68             	mov    0x68(%eax),%ecx
> 
> The above instruction does READ_ONCE(ssp->srcu_idx).
> 
> 	  24:	8b 40 7c             	mov    0x7c(%eax),%eax
> 
> The above instruction fetches ssp->sda into %eax.  I therefore find it
> quite surprising that the dump contains "EAX: 00000000".  Or is this
> register value inaccurate?
> 
> 	  27:	83 e1 01             	and    $0x1,%ecx
> 
> The above instruction does the "& 0x1".  Therefore, at this point,
> %eax contains the address of the per-CPU srcu_data structure, but
> without the per-CPU offset having been applied.  Also, %ecx contains
> the array index, either 0 or 1.  Here we have zero, which is perfectly
> legitimate.
> 
> 	  2a:*	64 ff 04 88          	incl   %fs:(%eax,%ecx,4)
> 
> The above instruction does the this_cpu_inc().  Here %fs is presumably
> this CPU's offset from the base address of the per-CPU ->sda pointer.
> 
> 	  2e:	f0 83 44 24 fc 00    	lock addl $0x0,-0x4(%esp)
> 
> The above instruction is the smp_mb().
> 
> So here are a few questions that I would ask:

Oh, and this one:

0.	Did someone call srcu_read_lock() before init_srcu_struct()
	had been called on this srcu_struct structure?

							Thanx, Paul

> 1.	Did the init_srcu_struct() for this srcu_struct report an error?
> 	(Though with current mainline, that memory-allocation failure
> 	would more likely have page-faulted in init_srcu_struct().)
> 
> 2.	Has the srcu_struct in question already been passed to
> 	cleanup_srcu_struct()?
> 
> 3.	Has the value of %fs been clobbered?  Though that seems
> 	unlikely given that it also happens on aarch64.  Plus, the
> 	smoking gun seems to me to be the zero value of %eax.
> 
> 4.	If the above three questions fail to provide enlightenment,
> 	I suggest recording the ->sda value and adding debug checks
> 	to anything that can unmap memory...  And recording the value
> 	of ->sda somewhere to check to see if it is being changed (it
> 	should remain constant from init_srcu_struct()'s return through
> 	the corresponding call to cleanup_srcu_struct()).
> 
> Please let me know how it goes!
> 
> 							Thanx, Paul



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux