Hello, We recently changed our SSD and on a power cycle under load we encountered file corruption in many files on the nilfs partition. Some i/o processing was occurring when we encountered the file corruption (possibly on a power cycle). I did check the SSD with smart tools and no errors seem to have been logged. Here is the output of dmesg(1) on a mount and a read access of one of the affected files: NILFS nilfs_fill_super: start(silent=0) NILFS warning: mounting unchecked fs NILFS(recovery) nilfs_search_super_root: found super root: segnum=1824, seq=2164205, pseg_start=3737132, pseg_offset=1613 NILFS: recovery complete. segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds NILFS warning: mounting fs with errors NILFS nilfs_fill_super: mounted filesystem attempt to access beyond end of device sda3: rw=0, want=9331952664445036944, limit=49930020 attempt to access beyond end of device sda3: rw=0, want=8178943302301875000, limit=49930020 attempt to access beyond end of device sda3: rw=0, want=11216631730677685184, limit=49930020 attempt to access beyond end of device sda3: rw=0, want=7566444304283562424, limit=49930020 attempt to access beyond end of device sda3: rw=0, want=7161651204113109256, limit=49930020 attempt to access beyond end of device sda3: rw=0, want=5463845364605981576, limit=49930020 attempt to access beyond end of device sda3: rw=0, want=8888728157704767904, limit=49930020 attempt to access beyond end of device sda3: rw=0, want=9331952664445036944, limit=49930020 The output of nilfs-tune is: nilfs-tune -l /dev/sda3 nilfs-tune 2.1.0 Filesystem volume name: /writable Filesystem UUID: 11d71018-2c18-42ad-a842-f475e6b1c449 Filesystem magic number: 0x3434 Filesystem revision #: 2.0 Filesystem features: (none) Filesystem state: invalid or mounted,error Filesystem OS type: Linux Block size: 4096 Filesystem created: Mon Jul 11 17:21:39 2011 Last mount time: Tue Sep 11 10:41:22 2012 Last write time: Tue Sep 11 10:41:22 2012 Mount count: 972 Maximum mount count: 50 Reserve blocks uid: 0 (user root) Reserve blocks gid: 0 (group root) First inode: 11 Inode size: 128 DAT entry size: 32 Checkpoint size: 192 Segment usage size: 16 Number of segments: 3047 Device size: 25564170240 First data block: 1 # of blocks per segment: 2048 Reserved segments %: 5 Last checkpoint #: 2555913 Last block address: 3737132 Last sequence #: 2164205 Free blocks count: 5777408 Commit interval: 0 # of blks to create seg: 0 CRC seed: 0xb9934a73 CRC check sum: 0x1f5cb561 CRC check data size: 0x00000118 The problem initially appeared a few days ago possibly on the power cycle and it seems as if has been growing. The first error in /var/log/messages (btw, even the messages.1 file was corrupted in the middle) was in this directory which gives I/O error on any readdir of the directory: NILFS error (device sda3): nilfs_check_page: bad entry in directory #10778: rec_len is smaller than minimal - offset=0, inode=733085696, rec_len=0, name_len=193 NILFS error (device sda3): nilfs_readdir: bad page in #10778 Then this same directory later gave the same error and this also later on: NILFS error (device sda3): nilfs_check_page: bad entry in directory #10778: directory entry across blocks - offset=0, inode=1346725220, rec_len=24320, name_len=90 NILFS error (device sda3): nilfs_readdir: bad page in #10778 [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234 [<c04c2fdf>] nilfs_btree_do_lookup+0xcc/0x234 [<c04c441d>] nilfs_btree_lookup_contig+0x54/0x349 [<f88634d8>] scsi_done+0x0/0x16 [scsi_mod] [<f88df964>] ata_scsi_translate+0x107/0x12c [libata] [<f88634d8>] scsi_done+0x0/0x16 [scsi_mod] [<f88e20ae>] ata_scsi_queuecmd+0x18f/0x1ac [libata] [<f88e20c3>] ata_scsi_queuecmd+0x1a4/0x1ac [libata] [<c04f6ca4>] elv_next_request+0x127/0x134 [<c04c29a3>] nilfs_bmap_lookup_contig+0x31/0x43 [<c04bd214>] nilfs_get_block+0xb9/0x227 [<c04f6d78>] elv_insert+0xc7/0x160 [<c0495970>] do_mpage_readpage+0x2a4/0x5fd [<c04bd15b>] nilfs_get_block+0x0/0x227 [<c0458ba8>] find_lock_page+0x1a/0x7e [<c045b314>] find_or_create_page+0x31/0x88 [<c04c0a62>] __nilfs_get_page_block+0x70/0x8a [<c04c1171>] nilfs_grab_buffer+0x53/0x11a [<c0458d64>] add_to_page_cache+0x91/0xa2 [<c0495da9>] mpage_readpages+0x82/0xb6 [<c04bd15b>] nilfs_get_block+0x0/0x227 [<c045d2c9>] __alloc_pages+0x69/0x2cf [<c04bc651>] nilfs_readpages+0x0/0x15 [<c045e800>] __do_page_cache_readahead+0x11d/0x183 [<c04bd15b>] nilfs_get_block+0x0/0x227 [<c045e8ac>] blockable_page_cache_readahead+0x46/0x99 [<c045ea3f>] page_cache_readahead+0xb3/0x178 [<c0459270>] do_generic_mapping_read+0xb8/0x380 [<c0459daa>] __generic_file_aio_read+0x16a/0x1a3 [<c045887d>] file_read_actor+0x0/0xd5 [<c0459e1e>] generic_file_aio_read+0x3b/0x42 [<c0475b83>] do_sync_read+0xb6/0xf1 [<c0476cbb>] file_move+0x27/0x32 [<c043607b>] autoremove_wake_function+0x0/0x2d [<c0475acd>] do_sync_read+0x0/0xf1 [<c047645c>] vfs_read+0x9f/0x141 [<c04768aa>] sys_read+0x3c/0x63 [<c0404f17>] syscall_call+0x7/0xb ======================= NILFS: btree level mismatch: 36 != 1 Later we get corruption in many more files and directories on the nilfs partition, many with different errors & stack traces. Has anybody seen these errors and then worked around them? If so, can you please let me know how. Any thoughts on whether this is an SSD issue or is this is a nilfs bug? If it is a nilfs bug, have things been fixed in the newer kernel module. Thanks a lot. Zahid -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html