Hi! With debugging being discussed here, I wanted to pass on an issue that has no error message associated with it. This will be one of those error reports that Vyacheslav will find not too informative. He's been trying to help with those moments where NILFS2 will stop responding for no visible reason, but whatever issue has 100% reproducibility on my PC has no reproducibility on his PC. This is a new test; maybe this test will work. With NILFS2 '-o order=strict' mounts, at least, there is a repeatable deadlock-like behavior between segctord and a process that syncs. Usually, that process is lilo, but other programs can cause this behavior at random. After this issue is reached, clean shutdowns are almost impossible. At least here on an old Pentium III--512 MB RAM, Slackware 14.1, kernel 3.16.0-rc2, debug kernel config, old IDE drives--this script reproduces the issue: # ==== script ==== #!/bin/bash hdparm -W 0 /dev/hdc # write cache off mkfs.nilfs2 -f /dev/hdc4 mount -t nilfs2 -o order=strict /dev/hdc4 /mnt/tmp cd /mnt/tmp while true; do fs_mark -D 4 -t 4 -n 50 -s 512 -L 5 -d todelete rm -r todelete sync sleep 1 done # ==== end of script ==== Should your PC be too fast to make a deadlock happen, increase any or all of the numbers in the fs_mark command line. On this PC, it goes through the loop exactly once. After forcing a crash and collecting the core dump, I see this using the crash 7.0.4 program: crash> bt 274 PID: 274 TASK: dd9caac0 CPU: 0 COMMAND: "segctord" #0 [c0063d48] __schedule at c1641357 #1 [c0063dc8] schedule at c1641a7e #2 [c0063dd0] inode_wait at c11467c8 #3 [c0063dd8] __wait_on_bit at c1642133 #4 [c0063df0] __inode_wait_for_writeback at c1156d98 #5 [c0063e24] inode_wait_for_writeback at c1159fff #6 [c0063e34] evict at c11475de #7 [c0063e48] iput at c11482ef #8 [c0063e60] nilfs_dispose_list at c12f104a #9 [c0063ecc] nilfs_transaction_unlock at c12f14e9 #10 [c0063edc] nilfs_segctor_thread at c12f3fa1 #11 [c0063f28] kthread at c105fb56 #12 [c0063fb0] ret_from_kernel_thread at c164729e crash> bt 301 PID: 301 TASK: dd9cc020 CPU: 0 COMMAND: "sync" #0 [de9e1dac] __schedule at c1641357 #1 [de9e1e2c] schedule at c1641a7e #2 [de9e1e34] schedule_timeout at c1640a80 #3 [de9e1ea8] wait_for_completion at c1642436 #4 [de9e1ed4] sync_inodes_sb at c115ae12 #5 [de9e1f7c] sync_inodes_one_sb at c115e620 #6 [de9e1f84] iterate_supers at c112d1e8 #7 [de9e1fa0] sys_sync at c115e85c #8 [de9e1fb0] ia32_sysenter_target at c164736b EAX: 00000024 EBX: bf8b1954 ECX: 00000000 EDX: b775517c DS: 007b ESI: 00000001 ES: 007b EDI: 00000000 SS: 007b ESP: bf8b187c EBP: bf8b18b8 GS: 0000 CS: 0073 EIP: b776da8c ERR: 00000024 EFLAGS: 00000246 The behavior seems to happen more easily on new filesystems, or on a filesystem whose old checkpoints have all been removed by rmcp and nilfs-clean. It gets better once the filesystem is full again and nilfs_cleanerd has had a chance to run automatically. If the issue can be reproduced with an order=relaxed mount, I have not tested it sufficiently. Spot-tests seem OK. The full core dump is available, should you need it. Thanks! Michael -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html