Hi! I have an old multi-boot x86 PC that I use for testing. One of its root partitions is NILFS2, and it is booted via LILO and a JFS-formatted /boot partition. All seems fine, but the umount of / can hang, especially when NILFS2 had to recover / on boot in read-only mode due to a crash. Using KDB to get stack traces, I wonder if segctord is waiting for an event that will not happen. [Actually, the umount of NILFS2 partitons can hang in other cases, too. This is a narrow case that I can repeat fairly often.] Is there a guaranteed good way to shutdown nilfs_cleanerd and NILFS2 properly on system shutdown? I tried to ensure that the killall5 program doesn't touch nilfs_cleanerd on shutdown, but that solution has started to not work again. The PC is an i686 Pentium 4 PC (32-bit), running a 3-day-old git Linux 3.11.0+ kernel. The operating system is slackware-current. It's set up for activities like kdb/kgdb and crash dumps, but I'm not very familiar with some of the programs I've installed here, especially gdb. If NILFS2 is more durable when the hard drive's write cache has been shut off, let me know, and I'll start over using a fresh NILFS2 file system and try to get this error again. Thanks! Michael # SCENARIO #1: umount.nilfs2 and segctord are part of a hung shutdown # For this, /boot and /tmp are JFS, and / is NILFS2. Once the non-NILFS2 # filesystems have been unmounted, there's an attempt to remount / # read-only. However, it hangs like this in 1 of 10 reboots for # clean mounts. If NILFS2 had to recover from a crash on boot, then this # will be the case on 1 of every 2 reboots: 0xde343ea0 70 2 0 0 D 0xde344158 segctord de3eddb8 00000092 de3edd70 c1071975 00000000 de343ea0 4950d87b 000000a3 de3ec000 de343ea0 00000000 c1554337 de343ea0 00000002 de3edd98 00000002 dfeee220 00000282 de3442e8 00000046 00000282 dfeee220 de3edda0 c107306f Call Trace: [<c1071975>] ? lock_release_holdtime.part.22+0xba/0xed [<c1554337>] ? _raw_spin_unlock_irqrestore+0x2f/0x56 [<c107306f>] ? trace_hardirqs_on+0xb/0xd [<c1109c41>] ? inode_lru_list_del+0x27/0x27 [<c15529cb>] schedule+0x22/0x4c [<c1109c4e>] inode_wait+0xd/0x11 [<c154fd9e>] __wait_on_bit+0x4e/0x6b [<c1109c41>] ? inode_lru_list_del+0x27/0x27 [<c11179ff>] __inode_wait_for_writeback+0x80/0x98 [<c104c06d>] ? autoremove_wake_function+0x3d/0x3d [<c1119d13>] inode_wait_for_writeback+0x1d/0x28 [<c110a7a3>] evict+0x83/0x15d [<c110b2a1>] iput+0xc3/0x137 [<c12a74a2>] nilfs_dispose_list+0xfc/0x14b [<c12a7867>] nilfs_transaction_unlock+0x55/0x5e [<c12aa000>] nilfs_segctor_thread+0xd5/0x2ad [<c12a9f2b>] ? nilfs_segctor_construct+0x229/0x229 [<c104b557>] kthread+0xa7/0xa9 [<c15556b7>] ret_from_kernel_thread+0x1b/0x28 [<c104b4b0>] ? insert_kthread_work+0x63/0x63 Stack traceback for pid 392 0xc01514e0 392 391 0 0 D 0xc0151798 umount.nilfs2 dddd7de0 00000086 3c20b7bc 00000129 00000000 c01514e0 1d8dd8ca 000000a2 dddd6000 c01514e0 000000a2 0003794c 00000000 1d8deaf9 000000a2 00000000 c107308b 00000000 dddd7dcc c10587b4 df016580 00000086 c0151950 dddd7dcc Call Trace: [<c107308b>] ? trace_hardirqs_off_caller+0x1a/0x116 [<c10587b4>] ? sched_clock_cpu+0x8f/0xe2 [<c15529cb>] schedule+0x22/0x4c [<c154fc08>] schedule_timeout+0xf8/0x1e8 [<c1554385>] ? _raw_spin_unlock_irq+0x27/0x36 [<c107306f>] ? trace_hardirqs_on+0xb/0xd [<c1552d66>] wait_for_completion+0x9e/0xce [<c1055d3f>] ? try_to_wake_up+0x138/0x138 [<c111a986>] sync_inodes_sb+0xc3/0x1f2 [<c1552cf3>] ? wait_for_completion+0x2b/0xce [<c111da6a>] sync_filesystem+0x51/0x88 [<c10f623b>] do_remount_sb+0x43/0x168 [<c155205a>] ? down_write+0x92/0x99 [<c1110f72>] SyS_umount+0x2cf/0x2ff [<c1554f0b>] ? restore_all+0xf/0xf [<c1110fc0>] SyS_oldumount+0x1e/0x20 [<c155573b>] sysenter_do_call+0x12/0x32 # SCENARIO #2: segctord and sync are part of a hung shutdown # Shutdown, using NILFS2 for / and /tmp. I tried to umount # the non-NILFS2 filesystems first, then run sync, then umount # the NILFS2 filesystems. It stopped at sync, where sync and # segctord wait on the same things as do umount.nifs2 and # segctord. In other words, the shutdown script might not had # a chance to umount the NILFS2 file systems. Entering kdb (current=0xc171b620, pid 0) due to Keyboard Entry kdb> ps 48 sleeping system daemon (state M) processes suppressed, use 'ps A' to see all. Task Addr Pid Parent [*] cpu State Thread Command 0xc171b620 0 0 1 0 R 0xc171b8d8 *swapper 0xdf098000 1 0 0 0 S 0xdf0982b8 init 0xdde9a9c0 72 2 0 0 D 0xdde9ac78 segctord 0xdd44e860 102 1 0 0 S 0xdd44eb18 nilfs_cleanerd 0xdd44a9c0 108 1 0 0 S 0xdd44ac78 nilfs_cleanerd 0xdae329c0 2187 1 0 0 S 0xdae32c78 rc.6 0xdde9e860 2264 2187 0 0 D 0xdde9eb18 sync kdb> btp 72 Stack traceback for pid 72 0xdde9a9c0 72 2 0 0 D 0xdde9ac78 segctord dd421db8 00000092 dd421d70 c1071975 00000000 dde9a9c0 8da9428a 0000001e dd420000 dde9a9c0 00000000 c153ab67 dde9a9c0 00000002 dd421d98 00000002 dfeee7c0 00000282 dde9ae08 00000046 00000282 dfeee7c0 dd421da0 c107306f Call Trace: [<c1071975>] ? lock_release_holdtime.part.22+0xba/0xed [<c153ab67>] ? _raw_spin_unlock_irqrestore+0x2f/0x56 [<c107306f>] ? trace_hardirqs_on+0xb/0xd [<c1109c41>] ? inode_lru_list_del+0x27/0x27 [<c15391fb>] schedule+0x22/0x4c [<c1109c4e>] inode_wait+0xd/0x11 [<c15365ce>] __wait_on_bit+0x4e/0x6b [<c1109c41>] ? inode_lru_list_del+0x27/0x27 [<c11179ff>] __inode_wait_for_writeback+0x80/0x98 [<c104c06d>] ? autoremove_wake_function+0x3d/0x3d [<c1119d13>] inode_wait_for_writeback+0x1d/0x28 [<c110a7a3>] evict+0x83/0x15d [<c110b2a1>] iput+0xc3/0x137 [<c12a5672>] nilfs_dispose_list+0xfc/0x14b [<c12a5a37>] nilfs_transaction_unlock+0x55/0x5e [<c12a81d0>] nilfs_segctor_thread+0xd5/0x2ad [<c12a80fb>] ? nilfs_segctor_construct+0x229/0x229 [<c104b557>] kthread+0xa7/0xa9 [<c153bf37>] ret_from_kernel_thread+0x1b/0x28 [<c104b4b0>] ? insert_kthread_work+0x63/0x63 kdb> btp 2264 Stack traceback for pid 2264 0xdde9e860 2264 2187 0 0 D 0xdde9eb18 sync dbf43e3c 00000096 16f4459c 0000003b 00000000 dde9e860 5d9bbbf5 0000001d dbf42000 dde9e860 0000001d 00163d3a 00000000 5d9ce464 0000001d 00000000 c107308b 00000000 dbf43e28 c10587b4 df016580 00000086 dde9ecd0 dbf43e28 Call Trace: [<c107308b>] ? trace_hardirqs_off_caller+0x1a/0x116 [<c10587b4>] ? sched_clock_cpu+0x8f/0xe2 [<c15391fb>] schedule+0x22/0x4c [<c1536438>] schedule_timeout+0xf8/0x1e8 [<c153abb5>] ? _raw_spin_unlock_irq+0x27/0x36 [<c107306f>] ? trace_hardirqs_on+0xb/0xd [<c1539596>] wait_for_completion+0x9e/0xce [<c1055d3f>] ? try_to_wake_up+0x138/0x138 [<c111a986>] sync_inodes_sb+0xc3/0x1f2 [<c1539523>] ? wait_for_completion+0x2b/0xce [<c111d955>] sync_inodes_one_sb+0x15/0x17 [<c10f5eb9>] iterate_supers+0xc5/0xc7 [<c111d940>] ? SyS_tee+0x2c5/0x2c5 [<c111dad2>] sys_sync+0x31/0x78 [<c153bfbb>] sysenter_do_call+0x12/0x32 -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html