I've been trying to diagnose a problem where ext4 can't be reliably used under preempt-rt kernels. While I've not solved that yet, I have found two things that look to me to be possible issues that could also impact mainline (and two cosmetic fixes), so I'm sending them here to see if they make sense to others who are more familiar with filesystem code than I am. At the moment, I can't see why the issues I've identified here couldn't happen on mainline/non-RT. The series attached here was build/boot tested on v3.10-rc5. [ -- stop reading here if you don't care about how ext4 fails on RT -- ] With respect to the RT problem, it looks like I'm seeing a stall in jbd2_journal_commit_transaction() in the jbd2 thread, and then everything else piles up behind that. The stall looks like this: [ 1593.701664] jbd2/sda1-8 D ffff880223c758c0 4832 1214 2 0x00000000 [ 1593.701668] ffff88021ff97b10 0000000000000046 ffff88021ff97ac0 ffffffff8106e8f1 [ 1593.701669] ffff880223c75500 ffff88021ff97fd8 ffff88021ff97fd8 ffff88021ff97fd8 [ 1593.701670] ffffffff81c16440 ffff880223c75500 ffff88021ff97ae0 ffff880223c75500 [ 1593.701671] Call Trace: [ 1593.701673] [<ffffffff8106e8f1>] ? get_parent_ip+0x11/0x50 [ 1593.701674] [<ffffffff810e9560>] ? __lock_page+0x70/0x70 [ 1593.701676] [<ffffffff8172dc34>] schedule+0x24/0x70 [ 1593.701677] [<ffffffff8172dd0a>] io_schedule+0x8a/0xd0 [ 1593.701678] [<ffffffff810e9569>] sleep_on_page+0x9/0x10 [ 1593.701679] [<ffffffff8172cba8>] __wait_on_bit+0x58/0x90 [ 1593.701680] [<ffffffff810e9ca9>] ? find_get_pages_tag+0x109/0x190 [ 1593.701681] [<ffffffff810e9688>] wait_on_page_bit+0x78/0x80 [ 1593.701682] [<ffffffff810603d0>] ? autoremove_wake_function+0x40/0x40 [ 1593.701684] [<ffffffff810e9a39>] filemap_fdatawait_range+0xf9/0x190 [ 1593.701685] [<ffffffff812c4451>] ? submit_bio+0x71/0xe0 [ 1593.701686] [<ffffffff8106e8f1>] ? get_parent_ip+0x11/0x50 [ 1593.701687] [<ffffffff8103f125>] ? unpin_current_cpu+0x15/0x80 [ 1593.701689] [<ffffffff810e9aef>] filemap_fdatawait+0x1f/0x30 [ 1593.701691] [<ffffffff81220c34>] jbd2_journal_commit_transaction+0xb14/0x1c20 [ 1593.701692] [<ffffffff81226b91>] kjournald2+0xc1/0x270 Using jbd_debug() it seems that I end up with jbd2_log_do_checkpoint and jbd2_journal_commit_transaction running into each other. In one of my attached patches, I show they overlap to the point of interrupting each others jbd_debug messages. Maybe that doesn't matter? Here is a typical debug trace with time stamps: [ 695.229129] (fs/jbd2/journal.c, 495): __jbd2_log_start_commit: JBD2: requesting commit 42331/42330 [ 695.229141] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes [ 695.229145] (fs/jbd2/journal.c, 211): kjournald2: woke because of timeout [ 695.229147] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42330, commit_request=42331 [ 695.229150] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ [ 695.229158] (fs/jbd2/commit.c, 372): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42331 [ 695.229164] (fs/jbd2/journal.c, 495): __jbd2_log_start_commit: JBD2: requesting commit 42331/42330 [ 695.243653] (fs/jbd2/journal.c, 495): __jbd2_log_start_commit: JBD2: requesting commit 42331/42330 [ 695.247738] (fs/jbd2/checkpoint.c, 345): jbd2_log_do_checkpoint: Start checkpoint [ 695.247749] (fs/jbd2/checkpoint.c, 354): jbd2_log_do_checkpoint: cleanup_journal_tail returned 1 [ 695.248917] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42331, j_commit_sequence=42330 [ 695.265162] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 10 revoke records <filesystem now dead for writes> Notice that we never got the expected message: "jbd2_journal_commit_transaction: JBD2: commit 42331 complete" Stuck waiting/spinning somewhere in jbd2_journal_commit_transaction. As near as I can tell, it never got to phase 3 of commit_transaction. Since jbd2_journal_commit_transaction is such a large function, I'm tempted to break it up some, just to ease my debugging (compare 0x1c20 to the smaller numbers around it). Perhaps there would be interest in such kinds of patches for mainline? Thanks, Paul. --- Paul Gortmaker (4): jbd2/journal_commit_transaction: relocate state lock to incorporate all users jbd2/log_wait_for_space: drop checkpoint mutex when waiting jbd2: fix duplicate debug label for phase 2 jbd/jbd2: relocate bit_spinlock header to jbd_common fs/jbd2/checkpoint.c | 7 +++++++ fs/jbd2/commit.c | 7 ++++--- include/linux/jbd.h | 1 - include/linux/jbd2.h | 1 - include/linux/jbd_common.h | 2 ++ 5 files changed, 13 insertions(+), 5 deletions(-) -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html