Am Mittwoch, 23. November 2016, 16:40:07 schrieb Andreas Dilger: > On Nov 23, 2016, at 3:43 AM, Wolfgang Walter <linux@xxxxxxx> wrote: > > Am Dienstag, 22. November 2016, 16:02:53 schrieben Sie: > >> On Nov 22, 2016, at 6:56 AM, Wolfgang Walter <linux@xxxxxxx> wrote: > >>> Am Montag, 21. November 2016, 17:49:36 schrieben Sie: > >>>> On Nov 21, 2016, at 8:28 AM, Wolfgang Walter <linux@xxxxxxx> wrote: > >>>>> Hello, [snip] > Stepping back a bit - does this problem only happen with an external > journal device, or does it also happen with an internal journal? > So I tried that this weekend. I got again these messages JBD2: Spotted dirty metadata buffer (dev = dm-22, blocknr = 241763277). There's a risk of filesystem corruption in case of system crash. So this also happens with an internal journal. I think it has something to do with data=journal + writing fast a lot of data and removing and/or moving a files at the same time. I then tried it again with an internal journal but without the option journal_async_commit. I got these messages again. This time though, after some time running, the rsync got stuck in D state. I waited more than an hour but it did not recover. According to iostat there was no IO any more. I couldn't access the filesystem any more, ls -l oder touch got stuck in D state, too. I didn't get any kernel message, though. top reported the whole time one cpu (and always the same one) with about 100% wa time. No process but top itself seemed to be active. I never had that with external journal + journal_async_commit. When I run bonnie++ and copied large amount of data with dd at the same time I got some problems. But then the filesystem recovered every time, and I got messages like that one: INFO: task dd:16705 blocked for more than 120 seconds. Tainted: G E 4.8.8-debian64.all+1.1 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. dd D ffff9d70dfc18700 0 16705 16503 0x00000004 ffff9d686ab53000 ffff9d70d2897080 ffff9d713fffbb80 024280ca8d8d1e9c ffff9d70af8ac000 ffff9d70af8aba58 ffff9d90a94840ec ffff9d686ab53000 00000000ffffffff ffff9d90a94840f0 ffffffff8e7ee6f1 ffff9d90a94840e8 Call Trace: [<ffffffff8e7ee6f1>] ? schedule+0x31/0x80 [<ffffffff8e7ee98a>] ? schedule_preempt_disabled+0xa/0x10 [<ffffffff8e7f0574>] ? __mutex_lock_slowpath+0xb4/0x130 [<ffffffff8e7f060b>] ? mutex_lock+0x1b/0x30 [<ffffffffc042e32f>] ? __jbd2_log_wait_for_space+0x7f/0x1e0 [jbd2] [<ffffffffc0428384>] ? add_transaction_credits+0x284/0x2a0 [jbd2] [<ffffffff8e2eb31f>] ? add_timer+0x12f/0x220 [<ffffffffc04284ff>] ? start_this_handle+0x10f/0x420 [jbd2] [<ffffffffc0428c99>] ? jbd2__journal_start+0xe9/0x1f0 [jbd2] [<ffffffffc0530595>] ? ext4_setattr+0x205/0x880 [ext4] [<ffffffff8e4226d9>] ? notify_change+0x229/0x430 [<ffffffff8e401cf2>] ? do_truncate+0x72/0xc0 [<ffffffff8e412fa8>] ? path_openat+0x388/0x14d0 [<ffffffff8e415371>] ? do_filp_open+0x91/0x100 [<ffffffff8e3b92cc>] ? handle_mm_fault+0xd1c/0x14b0 [<ffffffff8e4031d7>] ? do_sys_open+0x127/0x210 [<ffffffff8e7f2ab6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8 INFO: task bonnie++:19693 blocked for more than 120 seconds. Tainted: G E 4.8.8-debian64.all+1.1 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. bonnie++ D ffff9d70dfad8700 0 19693 16248 0x00000000 ffff9d70b0287040 ffff9d70d289c0c0 ffffffff8e2c7266 ffff9d70b18e4000 ffff9d70b18e4000 ffff9d70b18e3aa0 ffff9d90a94840ec ffff9d70b0287040 00000000ffffffff ffff9d90a94840f0 ffffffff8e7ee6f1 ffff9d90a94840e8 Call Trace: [<ffffffff8e2c7266>] ? mutex_optimistic_spin+0x176/0x1c0 [<ffffffff8e7ee6f1>] ? schedule+0x31/0x80 [<ffffffff8e7ee98a>] ? schedule_preempt_disabled+0xa/0x10 [<ffffffff8e7f0574>] ? __mutex_lock_slowpath+0xb4/0x130 [<ffffffff8e7f060b>] ? mutex_lock+0x1b/0x30 [<ffffffffc042e32f>] ? __jbd2_log_wait_for_space+0x7f/0x1e0 [jbd2] [<ffffffffc0428384>] ? add_transaction_credits+0x284/0x2a0 [jbd2] [<ffffffffc04284ff>] ? start_this_handle+0x10f/0x420 [jbd2] [<ffffffffc0428c99>] ? jbd2__journal_start+0xe9/0x1f0 [jbd2] [<ffffffffc052e2b5>] ? ext4_write_begin+0x125/0x4f0 [ext4] [<ffffffff8e37f0f3>] ? generic_perform_write+0xd3/0x1d0 [<ffffffff8e381a46>] ? __generic_file_write_iter+0x196/0x1e0 [<ffffffffc05231b6>] ? ext4_file_write_iter+0x196/0x3d0 [ext4] [<ffffffff8e40365b>] ? __vfs_write+0xeb/0x160 [<ffffffff8e4047f3>] ? vfs_write+0xb3/0x1a0 [<ffffffff8e405bf2>] ? SyS_write+0x52/0xc0 [<ffffffff8e7f2ab6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8 But there was IO both on the filesystem and the journal (the devices of the filesystem is slower then the device the journal sits on). Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html