On 11-02-20 08:55 AM, Mark Lord wrote: > On 11-02-20 01:15 AM, Ted Ts'o wrote: >> On Sun, Feb 20, 2011 at 12:05:27AM -0500, Mark Lord wrote: >>> I suppose it must be, as there's no other 0x3c offset in that function. >>> Which means it's probably this line that's crashing: >>> >>> BUG_ON(pa->pa_obj_lock != &ei->i_prealloc_lock); >>> >>> ...which could only happen if "pa" was NULL there. >>> I wonder how that happened ? >> >> Which could only happen if ei->i_prealloc_list were not properly >> initialized (i..e, it was still NULL). Which shouldn't ever >> happen...., since all ext4_inodes are initialized in >> ext4_alloc_inode(). >> >> Hmm, can you replicate the crash? > > So far it has been a one time deal here, > but stuff like this is pretty serious nonetheless. > > I suppose it could also happen if another thread did a list-delete > at the same time as that function was running. Which would require > that there be a locking bug/confusion somewhere. > > Looking over the code, most places use rcu to protect accesses, > except for the fragment that crashed. That's probably just fine, > but something to reexamine just out of paranoia. > > Also, the spinlock pointer appears to be dynamic, one of two > possible spinlocks. Maybe something got confused there > (well, obviously *something* got confused, so..). That looks like the best candidate: perhaps pa->pa_obj_lock was one of the per-cpu lg_prealloc_lock's at that point in time. In which case an item could be deleted from the pa list concurrently with the function that actually crashed? That's as far as I can get with it in the time available. You folks do know this code much better, so perhaps just expend a few little grey cells on that theory before calling it quits? Cheers! -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html