Hi, I use kmod-nilfs2-0.4.3-1.el6.x86_64 在 2012-12-22,22:12,Seiji Kihara <kihara@xxxxxxxx> 写道: > Hello, > > (2012/12/20 19:16), 张 磊 wrote: >> 1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64 > > If you use nilfs2 kernel module for RHEL 6 clones, > 'rpm -q kmod-nilfs2' will help. > > http://www.nilfs.org/en/pkg_centos.html > https://github.com/nilfs-dev/nilfs2-kmod-centos6 > > Regards, > > Seiji > >> 2. nilfs-utils version: nilfs-utils-2.1.4 >> 3. "mount" output: >> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909) >> >> 4. "df -h" output: >> /dev/sdb2 9.6T 5.9T 3.2T 66% /data0 >> >> 5. "lscp" output: >> CNO DATE TIME MODE FLG NBLKINC ICNT >> 2 2012-12-03 14:03:01 ss - 14 3 >> 580481 2012-12-20 16:11:25 cp - 293 697667 >> 580482 2012-12-20 16:11:25 cp - 130 697666 >> 580483 2012-12-20 16:11:25 cp - 225 697664 >> 580484 2012-12-20 16:11:25 cp - 143 697663 >> 580485 2012-12-20 16:11:26 cp - 311 697659 >> 580486 2012-12-20 16:11:27 cp - 328 697657 >> 580487 2012-12-20 16:11:27 cp - 263 697655 >> 580488 2012-12-20 16:11:27 cp - 118 697653 >> 580489 2012-12-20 16:11:28 cp - 230 697651 >> 580490 2012-12-20 16:11:28 cp - 272 697649 >> 580491 2012-12-20 16:11:28 cp - 148 697648 >> 580492 2012-12-20 16:11:29 cp - 139 697647 >> 580493 2012-12-20 16:11:29 cp - 273 697645 >> 580494 2012-12-20 16:11:29 cp - 147 697644 >> 580495 2012-12-20 16:11:30 cp - 271 697641 >> 580496 2012-12-20 16:11:31 cp - 526 697636 >> 580497 2012-12-20 16:11:34 cp - 1684 697625 >> 580498 2012-12-20 16:11:37 cp - 983 697609 >> 580499 2012-12-20 16:11:38 cp - 421 697605 >> 580500 2012-12-20 16:11:40 cp - 1019 697594 >> 580501 2012-12-20 16:11:40 cp - 143 697593 >> 580502 2012-12-20 16:11:41 cp - 1536 697592 >> 580503 2012-12-20 16:11:41 cp - 373 697590 >> 580504 2012-12-20 16:11:42 cp - 312 697587 >> 580505 2012-12-20 16:11:42 cp - 102 697586 >> 580506 2012-12-20 16:11:43 cp - 274 697584 >> 580507 2012-12-20 16:11:43 cp - 270 697582 >> 580508 2012-12-20 16:11:43 cp - 118 697581 >> 580509 2012-12-20 16:11:43 cp - 133 697580 >> 580510 2012-12-20 16:11:44 cp - 321 697578 >> 580511 2012-12-20 16:11:44 cp - 245 697576 >> 580512 2012-12-20 16:11:45 cp - 394 697573 >> 580513 2012-12-20 16:11:45 cp - 121 697572 >> 580514 2012-12-20 16:11:45 cp - 245 697569 >> 580515 2012-12-20 16:11:52 cp - 2705 697543 >> 580516 2012-12-20 16:11:55 cp - 2590 697504 >> 580517 2012-12-20 16:11:59 cp - 2418 697453 >> 580518 2012-12-20 16:12:00 cp - 866 697436 >> 580519 2012-12-20 16:12:01 cp - 864 697420 >> 580520 2012-12-20 16:12:05 cp - 1765 697357 >> 580521 2012-12-20 16:12:05 cp - 120 697356 >> 580522 2012-12-20 16:12:06 cp - 820 697332 >> 580523 2012-12-20 16:12:09 cp - 1642 697174 >> 580524 2012-12-20 16:12:09 cp - 89 697173 >> 580525 2012-12-20 16:12:10 cp - 56 697173 >> 580526 2012-12-20 16:12:42 cp - 763 697173 >> >> 6. "lssu" output: >> it's too large, please download it: http://d.pr/f/vnoR >> >> 7. "nilfs-tune -l" output (superblock content): >> >> nilfs-tune 2.1.4 >> Filesystem volume name: (none) >> Filesystem UUID: dcfb7152-a342-48d0-a712-212a3062395e >> Filesystem magic number: 0x3434 >> Filesystem revision #: 2.0 >> Filesystem features: (none) >> Filesystem state: invalid or mounted,error >> Filesystem OS type: Linux >> Block size: 4096 >> Filesystem created: Mon Dec 3 13:56:51 2012 >> Last mount time: Thu Dec 20 17:44:03 2012 >> Last write time: Thu Dec 20 17:44:03 2012 >> Mount count: 13 >> Maximum mount count: 50 >> Reserve blocks uid: 0 (user root) >> Reserve blocks gid: 0 (group root) >> First inode: 11 >> Inode size: 128 >> DAT entry size: 32 >> Checkpoint size: 192 >> Segment usage size: 16 >> Number of segments: 1246464 >> Device size: 10456104173568 >> First data block: 1 >> # of blocks per segment: 2048 >> Reserved segments %: 5 >> Last checkpoint #: 580526 >> Last block address: 1040286376 >> Last sequence #: 1753809 >> Free blocks count: 973875200 >> Commit interval: 60 >> # of blks to create seg: 0 >> CRC seed: 0x3adfb6c3 >> CRC check sum: 0x8468fbbf >> CRC check data size: 0x00000118 >> >> >> I found this in /var/log/messages, perhaps it is related to the bad bree node: >> >> Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20 >> Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1 >> Dec 18 15:55:02 localhost kernel: Call Trace: >> Dec 18 15:55:02 localhost kernel: <IRQ> [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450 >> Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50 >> Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430 >> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2] >> Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860 >> Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50 >> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2] >> Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0 >> Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2] >> Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170 >> Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90 >> Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11 >> Dec 18 15:55:02 localhost kernel: <EOI> [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2] >> Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2] >> Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2] >> Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160 >> Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0 >> Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2] >> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2] >> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2] >> Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0 >> Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90 >> Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b >> >> >> >> 在 2012-12-20,17:38,Vyacheslav Dubeyko <slava@xxxxxxxxxxx> 写道: >> >>> On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote: >>>> Hi, >>>> >>>> I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again. >>>> >>>> Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds >>>> Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors >>>> Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start >>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088 >>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775) >>>> Dec 20 16:03:55 localhost kernel: >>>> Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only >>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088 >>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775) >>>> Dec 20 16:03:55 localhost kernel: >>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system >>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown >>>> >>>> I remounted the filesystem again, and tried to delete the bad files, but delete failed. >>>> >>>> Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds >>>> Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors >>>> Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start >>>> Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088 >>>> Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775) >>>> Dec 20 16:12:08 localhost kernel: >>>> Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only >>>> Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5) >>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system >>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown >>>> >>>> I tried a third remount, but failed. The server was down, and restarted. >>>> >>>> Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer >>>> Dec 20 16:12:42 localhost kernel: >>>> >>> Yes, it is bad. The remount solves the trouble earlier. >>> >>> As a result, do you have NILFS2 volume mounted as read-only? >>> >>> Could you share more details about your environment? It needs for >>> understanding situation and trying to reproduce. I need to know: >>> 1. Linux kernel version. >>> 2. nilfs-utils version. >>> 3. "mount" output. >>> 4. "df -h" output. >>> 5. "lscp" output. >>> 6. "lssu" output. >>> 7. "nilfs-tune -l" output (superblock content) >>> >>>> I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4? >>>> >>> Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early >>> stage of development. The v4 is a fsck.nilfs2 patchset version. You can >>> try fsck.nilfs2 after applying this patchset on source code of >>> nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks >>> and segment summary headers and can't recover completely. So, I think >>> that it will be useless for you. >>> >>> With the best regards, >>> Vyacheslav Dubeyko. >>> >>>> 在 2012-12-20,14:08,Vyacheslav Dubeyko <slava@xxxxxxxxxxx> 写道: >>>> >>>>> Hi, >>>>> >>>>> On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote: >>>>>> Hello. >>>>>> My nilfs suddenly become read-only. I saw these logs in /var/log/messages: >>>>>> >>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088 >>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775) >>>>>> Dec 19 11:20:05 localhost kernel: >>>>>> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only >>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088 >>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775) >>>>>> Dec 19 11:20:05 localhost kernel: >>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088 >>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775) >>>>>> Dec 19 11:20:05 localhost kernel: >>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088 >>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775) >>>>>> Dec 19 11:20:05 localhost kernel: >>>>>> …………………………………………………… >>>>>> >>>>>> How can I fix this? There is 6TiB data on my disk, I don't want to format the disk. >>>>>> I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments. >>>>>> >>>>> Yes, this issue was reported earlier. As I understand, you can simply >>>>> remount your filesystem in read-write mode and to continue using your >>>>> NILFS2 filesystem. >>>>> >>>>> If you will encounter any troubles with remounting, please, report about >>>>> it. >>>>> >>>>> With the best regards, >>>>> Vyacheslav Dubeyko. >>>>> >>>>> >>>>>> Elmer Zhang > > -- > Seiji Kihara > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html