On Fri, 14 Sep 2012 14:06:10 -0700, Anatol Pomozov <anatol.pomozov@xxxxxxxxx> wrote: > Hi, > > I am debugging one issue that happens on our servers. We use ext4 with > non-journaling mode (2.6.34 kernel) and when we try to use > asynchronous IO we see following oops in dmesg: Strange.. I can't find the exact place there ext4_end_io_dio invokes ext4_orphan_del(). Can you please post the place you are talking about. > > <3>[ 3983.762966] bad: scheduling from the idle thread! > <4>[ 3983.762968] Pid: 0, comm: swapper > <4>[ 3983.762970] Call Trace: > <4>[ 3983.762972] <IRQ> [<ffffffff811d3fde>] dequeue_task_idle+0x24/0x30 > <4>[ 3983.762980] [<ffffffff81002f58>] schedule+0x2a98/0x3310 > <4>[ 3983.762985] [<ffffffff8101a08a>] ? sched_clock_cpu+0x2a/0xe0 > <4>[ 3983.762988] [<ffffffff8102b5d7>] ? mempool_alloc+0xa7/0x1a0 > <4>[ 3983.762992] [<ffffffff8100441b>] __mutex_lock_common.isra.3+0x14b/0x1d0 > <4>[ 3983.762996] [<ffffffff810045c3>] __mutex_lock_slowpath+0x13/0x20 > <4>[ 3983.762999] [<ffffffff81004242>] mutex_lock+0x22/0x40 > <4>[ 3983.763004] [<ffffffff8111918f>] ext4_orphan_del+0x4f/0x2e0 > <4>[ 3983.763008] [<ffffffff810b2e8c>] ? insert_work+0x6c/0xb0 > <4>[ 3983.763011] [<ffffffff81027af8>] ? diskmon_bio_complete+0x798/0xda0 > <4>[ 3983.763016] [<ffffffff812a33e8>] ext4_end_io_dio+0xb7/0x1d7 > <4>[ 3983.763021] [<ffffffff81050f3c>] dio_fast_end_async+0x1bc/0x1d0 > <4>[ 3983.763025] [<ffffffff8112c93a>] ? blk_complete_request+0x1a/0x20 > <4>[ 3983.763028] [<ffffffff81050a2d>] bio_endio+0x6d/0x80 > <4>[ 3983.763033] [<ffffffff81129002>] req_bio_endio+0x62/0xb0 > <4>[ 3983.763036] [<ffffffff81129202>] blk_update_request+0x142/0x3f0 > <4>[ 3983.763041] [<ffffffff8114232e>] ? ata_qc_complete+0xae/0x1f0 > <4>[ 3983.763044] [<ffffffff811299fc>] blk_end_bidi_request+0x2c/0xa0 > <4>[ 3983.763047] [<ffffffff81129a80>] blk_end_request+0x10/0x20 > <4>[ 3983.763050] [<ffffffff8113ffac>] scsi_io_completion+0xac/0x520 > <4>[ 3983.763053] [<ffffffff8113dca7>] scsi_finish_command+0xb7/0x110 > <4>[ 3983.763056] [<ffffffff8113fddf>] scsi_softirq_done+0x6f/0x140 > <4>[ 3983.763059] [<ffffffff8112c7d7>] blk_done_softirq+0x77/0x80 > <4>[ 3983.763062] [<ffffffff810156cf>] __do_softirq+0x37f/0x3e0 > <4>[ 3983.763066] [<ffffffff8109e7bc>] ? ack_apic_level+0x7c/0x1f0 > <4>[ 3983.763070] [<ffffffff810995cc>] call_softirq+0x1c/0x30 > <4>[ 3983.763072] [<ffffffff81005cf1>] do_softirq+0x41/0x80 > <4>[ 3983.763074] [<ffffffff81015879>] irq_exit+0x49/0xa0 > <4>[ 3983.763077] [<ffffffff810055b2>] do_IRQ+0x72/0xe0 > <4>[ 3983.763083] [<ffffffff814a0c13>] ret_from_intr+0x0/0xa > <4>[ 3983.763084] <EOI> [<ffffffff81005da0>] ? c1e_idle+0x70/0x170 > <4>[ 3983.763089] [<ffffffff81005860>] cpu_idle+0x90/0x130 > <4>[ 3983.763091] [<ffffffff8117b45a>] rest_init+0x7e/0x80 > <4>[ 3983.763094] [<ffffffff81b45c62>] start_kernel+0x3b7/0x3c3 > <4>[ 3983.763097] [<ffffffff81b45331>] x86_64_start_reservations+0x141/0x145 > <4>[ 3983.763101] [<ffffffff81b4544c>] x86_64_start_kernel+0x117/0x11e > > > > So the problem is that ext4_orphan_del() wants to sleep in softirq > context. I started debugging and here are some questions. > > The first question is why ext4_orphan_del() sleeps in no-journal mode > at all. It gets mutex to manipulate with i_orphan list but this list > is used only in journaling mode. In non-journal mode (in my case) both > ext4_orphan_del() and ext4_orphan_add() should be no-op. > > ext4_orphan_del() gets mutex in no-journal mode when it is called with > NULL as a first parameter. There are 10 places in fs/ext4 where it > happens: > > $ git grep "ext4_orphan_del(NULL" > fs/ext4/indirect.c:845: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:249: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:281: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:956: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:1069: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:1111: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:1177: ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:4338: > ext4_orphan_del(NULL, inode); > fs/ext4/inode.c:4365: ext4_orphan_del(NULL, inode); > fs/ext4/migrate.c:516: ext4_orphan_del(NULL, tmp_inode); > > > There was a change that fixes ext4_orphan_del(NULL) issue in > ext4_setattr for no-journal mode 3d287de3b828 . And I think we should > fix all other places as well. > > There are several possible solutions for this issue: > 1) Pass handle received by ext4_journal_current_handle() or similar. > Why do we pass NULL at all when we can use the handle? I see that in > some functions we already have "handle" variable that we can re-use. > 2) Follow the way used by Dmitry and call ext4_orphan_del only if > ext4_orphan_add was successful *and* handle is valid. This is not > always possible as not all _del() are paired with _add() in the same > function. > 3) Inside ext4_orphan_del() and ext4_orphan_add() check if journal is > enabled. Do nothing if this is no-journal mode. What is the best way > to check no-journal mode? Is it just "if (EXT4_SB(sb)->s_journal) ..." > > It seems that #1 is the best way. > > PS once this no-journal issue will be clarified I'll take a look at > sleeping issue in journaling mode. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html