Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 20, 2020 at 03:03:32PM +0100, Marco Elver wrote:
> On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> > On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > > looking at that and I think he is close to having something workable.
> > > 
> > > Mark -- is there anything Marco and Paul can try out?
> > 
> > I initially traced some issues back to commit:
> > 
> >   044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> > 
> > ... and that change of semantic could cause us to miss edges in some
> > cases, but IIUC mostly where we haven't done the right thing in
> > exception entry/return.
> > 
> > I don't think my patches address this case yet, but my WIP (currently
> > just fixing user<->kernel transitions) is at:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> > 
> > I'm looking into the kernel<->kernel transitions now, and I know that we
> > mess up RCU management for a small window around arch_cpu_idle, but it's
> > not immediately clear to me if either of those cases could cause this
> > report.
> 
> Thank you -- I tried your irq-fixes, however that didn't seem to fix the
> problem (still get warnings and then a panic). :-/

I've just updated that branch with a new version which I hope covers
kernel<->kernel transitions too. If you get a chance, would you mind
giving that a spin?

The HEAD commit should be:

  a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")

Otherwise, I intend to clean that up and post it tomorrow (without the
additional debug hacks). I've thrown my local Syzkaller instance at it
in the mean time (and if I get the chance tomrrow I'll try to get
rcutorture setup), and the only report I'm seeing so far looks genuine:

| BUG: sleeping function called from invalid context in sta_info_move_state

... as that was reported on x86 too, per:

https://syzkaller.appspot.com/bug?id=6c7899acf008be2ddcddb46a2567c2153193632a

Thanks,
Mark.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux