On Fri, Jul 13, 2018 at 11:33:37AM -0400, Peter Geis wrote: > Good Morning, > > I have been trying to track down a bug that has been causing my Tegra3 > device to reboot while compiling. > I finally managed to catch the offender, the details are below: > The offending code is a triggered bug in dget_parent, the code is: > rcu_read_unlock(); > BUG_ON(!ret->d_lockref.count); > ret->d_lockref.count++; Interesting... We call that while holding a reference to dentry (we'd better). That code is rcu_read_lock(); ret = dentry->d_parent; ret won't get freed until after rcu_read_unlock, so spin_lock is safe here spin_lock(&ret->d_lock); if (unlikely(ret != dentry->d_parent)) { spin_unlock(&ret->d_lock); rcu_read_unlock(); goto repeat; } Since we got through that, we have observed dentry->d_parent == ret with ret->d_lock held. rcu_read_unlock(); BUG_ON(!ret->d_lockref.count); Now, this means that dentry->d_parent is *not* equal to ret anymore - otherwise ret would remain pinned. The only place that changes ->d_parent of a live dentry is __d_move() - no other assignments exist. __d_move() is done under rename_lock - it's globally serialized. And it grabs ->d_lock on all parents involved before modifying ->d_parent of anything, so the observed condition (ret == dentry->d_parent, ret->d_lock held by us) can't change until we drop ret->d_lock... Which kernel had that been? It looks either like a memory corruption (anywhere) or as if you called that with dentry itself getting killed right under you. Reference to ->d_parent is not dropped until after the last reference to dentry goes away, so... Could you slap if (WARN_ON(!ret->d_lockref.count)) printk(KERN_ERR "child: %px[%ld], parent: %px:%px\n", dentry, (long)dentry->d_lockref.count, dentry->d_parent, ret); right before that rcu_read_unlock() and see if you can trigger that?