on 2017/1/11 23:34, Theodore Ts'o wrote: > On Wed, Jan 11, 2017 at 05:07:29PM +0800, zhangyi (F) wrote: >> >> (1) The file we want to unlink have many hard links, but only one dcache entry in memory. >> (2) open this file, but it's inode->i_nlink read from disk was 1 (too low). >> (3) some one call rename and drop it's i_nlink to zero. >> (4) it's inode is still in use and do not destroy (not closed), at the same time, >> some others open it's hard link and create a dcache entry. >> (5) call rename again and it's i_nlink will still underflow and cause memory corruption. > > Do you have reproducers that make it easy to reproduce situations like > this? (It shouldn't be hard to write, but if you have them already > will save me some effort. :-) > I make a reproducer, we can do the following steps to reproduce this probrem easily: 1) mount a ext4 file system, and create 3 files and 1 hard link, #mount /dev/sdax /mnt #cd /mnt #touch old_file1 old_file2 new_file #ln new_file new_link1 2) umount the file system and use the debugfs to change new_file's links_count value to 1, which is used to simulate the fs inconsistency, #umount /mnt #debugfs /dev/sdax -w set_inode_field new_file links_count 1 3) mount the fs again, and then execute the following program (Note: do not execute the ls cmd, it will create the second dcache entry), #define RENAME_OLD_FILE_1 "old_file1" #define RENAME_OLD_FILE_2 "old_file2" #define RENAME_NEW_FILE "new_file" #define NEW_FILE_LINK_1 "new_link1" int main(int argc, char *argv[]) { int fd = 0; int err = 0; fd = open(RENAME_NEW_FILE, O_RDONLY); if (fd < 0) { printf("open error:%d\n", errno); return -1; } err = rename(RENAME_OLD_FILE_1, RENAME_NEW_FILE); if (err < 0) { printf("rename error:%d\n", errno); close(fd); return -1; } err = rename(RENAME_OLD_FILE_2, NEW_FILE_LINK_1); if (err < 0) { printf("rename error:%d\n", errno); close(fd); return -1; } close(fd); return 0; } 4) after this, the new_file's inode->i_nlink is underflowed and add to orphan list, kernel dump like this: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1814 at fs/inode.c:282 drop_nlink+0x3e/0x50 ... Call Trace: dump_stack+0x63/0x86 __warn+0xcb/0xf0 warn_slowpath_null+0x1d/0x20 drop_nlink+0x3e/0x50 ext4_rename+0x532/0x8c0 ext4_rename2+0x1d/0x30 vfs_rename+0x728/0x940 ? __lookup_hash+0x20/0xa0 SyS_rename+0x3ba/0x3e0 entry_SYSCALL_64_fastpath+0x1a/0xa9 ... ---[ end trace b157dacbc891e6e8 ]--- 5) then, we trigger mem shrink, this inode will be destroyed but it is still on the orphan list, #echo 3 > /proc/sys/vm/drop_caches kernrl dump: EXT4-fs (sdb1): Inode 16 (ffff98f4b3285c20): orphan list check failed! ... ffff98f4b3285d30: fa87e800 ffff98f4 b3285e80 ffff98f4 .........^(..... ffff98f4b3285d40: b20829d8 ffff98f4 00000010 00000000 .).............. ffff98f4b3285d50: ffffffff 00000000 00000000 00000000 ................ ... Call Trace: dump_stack+0x63/0x86 ext4_destroy_inode+0xa0/0xb0 destroy_inode+0x3b/0x60 evict+0x130/0x1c0 dispose_list+0x4d/0x70 prune_icache_sb+0x5a/0x80 super_cache_scan+0x14b/0x1a0 shrink_slab.part.40+0x1f5/0x420 shrink_slab+0x29/0x30 drop_slab_node+0x31/0x60 drop_slab+0x3f/0x70 drop_caches_sysctl_handler+0x71/0xc0 proc_sys_call_handler+0xea/0x110 proc_sys_write+0x14/0x20 __vfs_write+0x37/0x160 ? selinux_file_permission+0xd7/0x110 ? security_file_permission+0x3b/0xc0 vfs_write+0xb5/0x1a0 SyS_write+0x55/0xc0 entry_SYSCALL_64_fastpath+0x1a/0xa9 ... bash (1594): drop_caches: 3 6) Some time later, if we change the orphan list, it will cause memory corruption. Thanks. zhangyi -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html