On Oct 17, 2013, at 4:14 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Thu, Oct 17, 2013 at 05:11:43PM -0400, George Spelvin wrote: >> >> Well, it happened again (error appended). Can you please clarify what you mean >> by "such BUG_ON()"; I'm having a hard time following the RCU code and determining >> all the situations under which __fput() might be called. > > __fput() can be called via task_work_run() or via schedule_work(). That's > all. And it certainly should never be called with interrupts disabled. > So stick BUG_ON(irqs_disabled()) in it (WARN_ON() might be better, but > not by much). > > There are two ways these traces could've happened: > * exit_task_work() called by do_exit() with irqs disabled. > Definitely buggy (and would do really nasty things to several functions > called by do_exit() before that one). > * __fput() is called with irqs enabled, but somewhere on the > way into ext4 (dput -> iput -> evict inode -> free blocks, now that > unlinked file got closed -> ...) we manage to disable irqs and forget > to enable them. IMHO the most common case of "BUG: sleeping function called from invalid context” is due to stack overflow. This corrupts the task struct, and incorrectly sets the “in_interrupt” bit. What kind of storage stack is underneath this filesystem? If it is deep (e.g. DM + LVM + iSCSI) then the stack overflow is definitely possible. There were also a discussion by Christoph of page allocation recursing into the fs again (in "xfs: prevent stack overflows from page cache allocation”) though I’m not sure if that applies to ext4 or not. Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html