Hi, On Wed, 21 Oct 2009 20:38:47 +0200, Bruno Prémont <bonbons@xxxxxxxxxxxxxxxxx> wrote: > Hi, > > nilfs seems to have some dead-locks that put processes in D-state (at > least on my arm system). > This time around it seems that syslog-ng has been hit first. The > previous times it most often was collectd/rrdtool. > > Kernel is vanilla 2.6.31.4 + a patch for USB HID device. System is arm, > Feroceon 88FR131, SheevaPlug. nilfs is being used on a SD card > (mmcblk0: mmc0:bc20 SD08G 7.60 GiB, mvsdio driver) > > Bruno > > > > Extracts from dmesg (less attempting to read a logfile produced by syslog-ng): > INFO: task less:15839 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > less D c02c8610 0 15839 1742 0x00000001 > [<c02c8610>] (schedule+0x2a8/0x3b0) from [<c02c9318>] (__mutex_lock_slowpath+0x88/0x140) > [<c02c9318>] (__mutex_lock_slowpath+0x88/0x140) from [<c009833c>] (generic_file_llseek+0x24/0x64) > [<c009833c>] (generic_file_llseek+0x24/0x64) from [<c0096d74>] (vfs_llseek+0x54/0x64) > [<c0096d74>] (vfs_llseek+0x54/0x64) from [<c00981c8>] (sys_llseek+0x74/0xcc) > [<c00981c8>] (sys_llseek+0x74/0xcc) from [<c0022f00>] (ret_fast_syscall+0x0/0x2c) > > > All stuck processes as listed by SysRq + T: > syslog-ng D c02c8610 0 1698 1 0x00000000 > [<c02c8610>] (schedule+0x2a8/0x3b0) from [<c02c8e60>] (schedule_timeout+0x14c/0x1e8) > [<c02c8e60>] (schedule_timeout+0x14c/0x1e8) from [<c02c8344>] (io_schedule_timeout+0x34/0x58) > [<c02c8344>] (io_schedule_timeout+0x34/0x58) from [<c007caf0>] (congestion_wait+0x5c/0x80) > [<c007caf0>] (congestion_wait+0x5c/0x80) from [<c0070fb0>] (balance_dirty_pages_ratelimited_nr+0xe8/0x290) > [<c0070fb0>] (balance_dirty_pages_ratelimited_nr+0xe8/0x290) from [<c0069f04>] (generic_file_buffered_write+0x10c/0x348) > [<c0069f04>] (generic_file_buffered_write+0x10c/0x348) from [<c006a64c>] (__generic_file_aio_write_nolock+0x264/0x4f4) > [<c006a64c>] (__generic_file_aio_write_nolock+0x264/0x4f4) from [<c006b3b8>] (generic_file_aio_write+0x74/0xe8) > [<c006b3b8>] (generic_file_aio_write+0x74/0xe8) from [<c0097228>] (do_sync_write+0xbc/0x100) > [<c0097228>] (do_sync_write+0xbc/0x100) from [<c0097d3c>] (vfs_write+0xb0/0x164) > [<c0097d3c>] (vfs_write+0xb0/0x164) from [<c0097ec0>] (sys_write+0x40/0x70) > [<c0097ec0>] (sys_write+0x40/0x70) from [<c0022f00>] (ret_fast_syscall+0x0/0x2c) > less D c02c8610 0 15839 1742 0x00000001 > [<c02c8610>] (schedule+0x2a8/0x3b0) from [<c02c9318>] (__mutex_lock_slowpath+0x88/0x140) > [<c02c9318>] (__mutex_lock_slowpath+0x88/0x140) from [<c009833c>] (generic_file_llseek+0x24/0x64) > [<c009833c>] (generic_file_llseek+0x24/0x64) from [<c0096d74>] (vfs_llseek+0x54/0x64) > [<c0096d74>] (vfs_llseek+0x54/0x64) from [<c00981c8>] (sys_llseek+0x74/0xcc) > [<c00981c8>] (sys_llseek+0x74/0xcc) from [<c0022f00>] (ret_fast_syscall+0x0/0x2c) > sshd D c02c8610 0 15844 15842 0x00000001 > [<c02c8610>] (schedule+0x2a8/0x3b0) from [<c02c8e60>] (schedule_timeout+0x14c/0x1e8) > [<c02c8e60>] (schedule_timeout+0x14c/0x1e8) from [<c02c8344>] (io_schedule_timeout+0x34/0x58) > [<c02c8344>] (io_schedule_timeout+0x34/0x58) from [<c007caf0>] (congestion_wait+0x5c/0x80) > [<c007caf0>] (congestion_wait+0x5c/0x80) from [<c0070fb0>] (balance_dirty_pages_ratelimited_nr+0xe8/0x290) > [<c0070fb0>] (balance_dirty_pages_ratelimited_nr+0xe8/0x290) from [<c0069f04>] (generic_file_buffered_write+0x10c/0x348) > [<c0069f04>] (generic_file_buffered_write+0x10c/0x348) from [<c006a64c>] (__generic_file_aio_write_nolock+0x264/0x4f4) > [<c006a64c>] (__generic_file_aio_write_nolock+0x264/0x4f4) from [<c006b3b8>] (generic_file_aio_write+0x74/0xe8) > [<c006b3b8>] (generic_file_aio_write+0x74/0xe8) from [<c0097228>] (do_sync_write+0xbc/0x100) > [<c0097228>] (do_sync_write+0xbc/0x100) from [<c0097d3c>] (vfs_write+0xb0/0x164) > [<c0097d3c>] (vfs_write+0xb0/0x164) from [<c0097ec0>] (sys_write+0x40/0x70) > [<c0097ec0>] (sys_write+0x40/0x70) from [<c0022f00>] (ret_fast_syscall+0x0/0x2c) > > nilfs related processes: > [40049.761881] segctord S c02c8610 0 859 2 0x00000000 > [40049.761894] [<c02c8610>] (schedule+0x2a8/0x3b0) from [<bf0154ec>] (nilfs_segctor_thread+0x2d4/0x328 [nilfs2]) > [40049.761999] [<bf0154ec>] (nilfs_segctor_thread+0x2d4/0x328 [nilfs2]) from [<c004aac0>] (kthread+0x7c/0x84) > [40049.762081] [<c004aac0>] (kthread+0x7c/0x84) from [<c0023940>] (kernel_thread_exit+0x0/0x8) > [40049.762101] nilfs_cleaner S c02c8610 0 860 1 0x00000000 > [40049.762115] [<c02c8610>] (schedule+0x2a8/0x3b0) from [<c02c99f4>] (do_nanosleep+0xb0/0x110) > [40049.762137] [<c02c99f4>] (do_nanosleep+0xb0/0x110) from [<c004eed0>] (hrtimer_nanosleep+0xa4/0x12c) > [40049.762161] [<c004eed0>] (hrtimer_nanosleep+0xa4/0x12c) from [<c004eff4>] (sys_nanosleep+0x9c/0xa4) > [40049.762181] [<c004eff4>] (sys_nanosleep+0x9c/0xa4) from [<c0022f00>] (ret_fast_syscall+0x0/0x2c) > [40049.762201] segctord S c02c8610 0 862 2 0x00000000 > [40049.762214] [<c02c8610>] (schedule+0x2a8/0x3b0) from [<bf0154ec>] (nilfs_segctor_thread+0x2d4/0x328 [nilfs2]) > [40049.762298] [<bf0154ec>] (nilfs_segctor_thread+0x2d4/0x328 [nilfs2]) from [<c004aac0>] (kthread+0x7c/0x84) > [40049.762377] [<c004aac0>] (kthread+0x7c/0x84) from [<c0023940>] (kernel_thread_exit+0x0/0x8) > [40049.762397] nilfs_cleaner S c02c8610 0 863 1 0x00000000 > [40049.762411] [<c02c8610>] (schedule+0x2a8/0x3b0) from [<c02c99f4>] (do_nanosleep+0xb0/0x110) > [40049.762433] [<c02c99f4>] (do_nanosleep+0xb0/0x110) from [<c004eed0>] (hrtimer_nanosleep+0xa4/0x12c) > [40049.762455] [<c004eed0>] (hrtimer_nanosleep+0xa4/0x12c) from [<c004eff4>] (sys_nanosleep+0x9c/0xa4) > [40049.762475] [<c004eff4>] (sys_nanosleep+0x9c/0xa4) from [<c0022f00>] (ret_fast_syscall+0x0/0x2c) > [40049.762495] segctord S c02c8610 0 865 2 0x00000000 > [40049.762507] [<c02c8610>] (schedule+0x2a8/0x3b0) from [<bf0154ec>] (nilfs_segctor_thread+0x2d4/0x328 [nilfs2]) > [40049.762591] [<bf0154ec>] (nilfs_segctor_thread+0x2d4/0x328 [nilfs2]) from [<c004aac0>] (kthread+0x7c/0x84) > [40049.762670] [<c004aac0>] (kthread+0x7c/0x84) from [<c0023940>] (kernel_thread_exit+0x0/0x8) > [40049.762690] nilfs_cleaner S c02c8610 0 866 1 0x00000000 > [40049.762703] [<c02c8610>] (schedule+0x2a8/0x3b0) from [<c02c99f4>] (do_nanosleep+0xb0/0x110) > [40049.762726] [<c02c99f4>] (do_nanosleep+0xb0/0x110) from [<c004eed0>] (hrtimer_nanosleep+0xa4/0x12c) > [40049.762748] [<c004eed0>] (hrtimer_nanosleep+0xa4/0x12c) from [<c004eff4>] (sys_nanosleep+0x9c/0xa4) > [40049.762768] [<c004eff4>] (sys_nanosleep+0x9c/0xa4) from [<c0022f00>] (ret_fast_syscall+0x0/0x2c) Thank you for reporting the issue. According to the log, the log-writer of nilfs looks to be idle even though it has some requests waiting. Could you try the following patch to narrow down the issue ? I'll dig into this issue next week since I'm now away from my office to attend the Linux symposium in Tokyo. Thank you, Ryusuke Konishi diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 51ff3d0..0932571 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -2471,6 +2471,8 @@ static void nilfs_segctor_notify(struct nilfs_sc_info *sci, sci->sc_state &= ~NILFS_SEGCTOR_COMMIT; if (req->mode == SC_LSEG_SR) { + printk(KERN_DEBUG "%s: completed request from=%d to=%d\n", + __func__, sci->sc_seq_done, req->seq_accepted); sci->sc_seq_done = req->seq_accepted; nilfs_segctor_wakeup(sci, req->sc_err ? : req->sb_err); sci->sc_flush_request = 0; @@ -2668,6 +2670,11 @@ static int nilfs_segctor_thread(void *arg) if (sci->sc_state & NILFS_SEGCTOR_QUIT) goto end_thread; + printk(KERN_DEBUG + "%s: sequence: req=%u, done=%u, state=%lx, timeout=%d\n", + __func__, sci->sc_seq_request, sci->sc_seq_done, + sci->sc_state, timeout); + if (timeout || sci->sc_seq_request != sci->sc_seq_done) mode = SC_LSEG_SR; else if (!sci->sc_flush_request) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html