Re: kernel panic: corrupted stack end in dput

Eric Biggers <ebiggers@xxxxxxxxxx> · Wed, 3 Jul 2019 08:45:43 -0700

[+bpf and tls maintainers]

On Wed, Jul 03, 2019 at 04:23:34PM +0100, Al Viro wrote:
> On Wed, Jul 03, 2019 at 03:40:00PM +0100, Al Viro wrote:
> > On Wed, Jul 03, 2019 at 02:43:07PM +0800, Hillf Danton wrote:
> > 
> > > > This is very much *NOT* fine.
> > > > 	1) trylock can fail from any number of reasons, starting
> > > > with "somebody is going through the hash chain doing a lookup on
> > > > something completely unrelated"
> > > 
> > > They are also a red light that we need to bail out of spiraling up
> > > the directory hierarchy imho.
> > 
> > Translation: "let's leak the reference to parent, shall we?"
> > 
> > > > 	2) whoever had been holding the lock and whatever they'd
> > > > been doing might be over right after we get the return value from
> > > > spin_trylock().
> > > 
> > > Or after we send a mail using git. I don't know.
> > > 
> > > > 	3) even had that been really somebody adding children in
> > > > the same parent *AND* even if they really kept doing that, rather
> > > > than unlocking and buggering off, would you care to explain why
> > > > dentry_unlist() called by __dentry_kill() and removing the victim
> > > > from the list of children would be safe to do in parallel with that?
> > > >
> > > My bad. I have to walk around that unsafety.
> > 
> > WHAT unsafety?  Can you explain what are you seeing and how to
> > reproduce it, whatever it is?
> 
> BTW, what makes you think that it's something inside dput() itself?
> All I see is that at some point in the beginning of the loop body
> in dput() we observe a buggered stack.
> 
> Is that the first iteration through the loop?  IOW, is that just
> the place where we first notice preexisting corruption, or is
> that something the code called from that loop does?  If it's
> a stack overflow, I would be very surprised to see it here -
> dput() is iterative and it's called on a very shallow stack in
> those traces.
> 
> What happens if you e.g. turn that
> 	dput(dentry);
> in __fput() into
> 	rcu_read_lock(); rcu_read_unlock(); // trigger the check
> 	dput(dentry);
> 
> and run your reporducer?
> 

Please don't waste your time on this, it looks like just another report from the
massive memory corruption in BPF and/or TLS.  Look at reproducer:

bpf$MAP_CREATE(0x0, &(0x7f0000000280)={0xf, 0x4, 0x4, 0x400, 0x0, 0x1}, 0x3c)
socket$rxrpc(0x21, 0x2, 0x800000000a)
r0 = socket$inet6_tcp(0xa, 0x1, 0x0)
setsockopt$inet6_tcp_int(r0, 0x6, 0x13, &(0x7f00000000c0)=0x100000001, 0x1d4)
connect$inet6(r0, &(0x7f0000000140), 0x1c)
bpf$MAP_CREATE(0x0, &(0x7f0000000000)={0x5}, 0xfffffffffffffdcb)
bpf$MAP_CREATE(0x2, &(0x7f0000003000)={0x3, 0x0, 0x77fffb, 0x0, 0x10020000000, 0x0}, 0x2c)
setsockopt$inet6_tcp_TCP_ULP(r0, 0x6, 0x1f, &(0x7f0000000040)='tls\x00', 0x4)

It's the same as like 20 other syzbot reports.

- Eric