Hello,
(2012/12/20 19:16), 张 磊 wrote:
1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64
If you use nilfs2 kernel module for RHEL 6 clones,
'rpm -q kmod-nilfs2' will help.
http://www.nilfs.org/en/pkg_centos.html
https://github.com/nilfs-dev/nilfs2-kmod-centos6
Regards,
Seiji
2. nilfs-utils version: nilfs-utils-2.1.4
3. "mount" output:
/dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909)
4. "df -h" output:
/dev/sdb2 9.6T 5.9T 3.2T 66% /data0
5. "lscp" output:
CNO DATE TIME MODE FLG NBLKINC ICNT
2 2012-12-03 14:03:01 ss - 14 3
580481 2012-12-20 16:11:25 cp - 293 697667
580482 2012-12-20 16:11:25 cp - 130 697666
580483 2012-12-20 16:11:25 cp - 225 697664
580484 2012-12-20 16:11:25 cp - 143 697663
580485 2012-12-20 16:11:26 cp - 311 697659
580486 2012-12-20 16:11:27 cp - 328 697657
580487 2012-12-20 16:11:27 cp - 263 697655
580488 2012-12-20 16:11:27 cp - 118 697653
580489 2012-12-20 16:11:28 cp - 230 697651
580490 2012-12-20 16:11:28 cp - 272 697649
580491 2012-12-20 16:11:28 cp - 148 697648
580492 2012-12-20 16:11:29 cp - 139 697647
580493 2012-12-20 16:11:29 cp - 273 697645
580494 2012-12-20 16:11:29 cp - 147 697644
580495 2012-12-20 16:11:30 cp - 271 697641
580496 2012-12-20 16:11:31 cp - 526 697636
580497 2012-12-20 16:11:34 cp - 1684 697625
580498 2012-12-20 16:11:37 cp - 983 697609
580499 2012-12-20 16:11:38 cp - 421 697605
580500 2012-12-20 16:11:40 cp - 1019 697594
580501 2012-12-20 16:11:40 cp - 143 697593
580502 2012-12-20 16:11:41 cp - 1536 697592
580503 2012-12-20 16:11:41 cp - 373 697590
580504 2012-12-20 16:11:42 cp - 312 697587
580505 2012-12-20 16:11:42 cp - 102 697586
580506 2012-12-20 16:11:43 cp - 274 697584
580507 2012-12-20 16:11:43 cp - 270 697582
580508 2012-12-20 16:11:43 cp - 118 697581
580509 2012-12-20 16:11:43 cp - 133 697580
580510 2012-12-20 16:11:44 cp - 321 697578
580511 2012-12-20 16:11:44 cp - 245 697576
580512 2012-12-20 16:11:45 cp - 394 697573
580513 2012-12-20 16:11:45 cp - 121 697572
580514 2012-12-20 16:11:45 cp - 245 697569
580515 2012-12-20 16:11:52 cp - 2705 697543
580516 2012-12-20 16:11:55 cp - 2590 697504
580517 2012-12-20 16:11:59 cp - 2418 697453
580518 2012-12-20 16:12:00 cp - 866 697436
580519 2012-12-20 16:12:01 cp - 864 697420
580520 2012-12-20 16:12:05 cp - 1765 697357
580521 2012-12-20 16:12:05 cp - 120 697356
580522 2012-12-20 16:12:06 cp - 820 697332
580523 2012-12-20 16:12:09 cp - 1642 697174
580524 2012-12-20 16:12:09 cp - 89 697173
580525 2012-12-20 16:12:10 cp - 56 697173
580526 2012-12-20 16:12:42 cp - 763 697173
6. "lssu" output:
it's too large, please download it: http://d.pr/f/vnoR
7. "nilfs-tune -l" output (superblock content):
nilfs-tune 2.1.4
Filesystem volume name: (none)
Filesystem UUID: dcfb7152-a342-48d0-a712-212a3062395e
Filesystem magic number: 0x3434
Filesystem revision #: 2.0
Filesystem features: (none)
Filesystem state: invalid or mounted,error
Filesystem OS type: Linux
Block size: 4096
Filesystem created: Mon Dec 3 13:56:51 2012
Last mount time: Thu Dec 20 17:44:03 2012
Last write time: Thu Dec 20 17:44:03 2012
Mount count: 13
Maximum mount count: 50
Reserve blocks uid: 0 (user root)
Reserve blocks gid: 0 (group root)
First inode: 11
Inode size: 128
DAT entry size: 32
Checkpoint size: 192
Segment usage size: 16
Number of segments: 1246464
Device size: 10456104173568
First data block: 1
# of blocks per segment: 2048
Reserved segments %: 5
Last checkpoint #: 580526
Last block address: 1040286376
Last sequence #: 1753809
Free blocks count: 973875200
Commit interval: 60
# of blks to create seg: 0
CRC seed: 0x3adfb6c3
CRC check sum: 0x8468fbbf
CRC check data size: 0x00000118
I found this in /var/log/messages, perhaps it is related to the bad bree node:
Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
Dec 18 15:55:02 localhost kernel: Call Trace:
Dec 18 15:55:02 localhost kernel: <IRQ> [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
Dec 18 15:55:02 localhost kernel: <EOI> [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
在 2012-12-20,17:38,Vyacheslav Dubeyko <slava@xxxxxxxxxxx> 写道:
On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote:
Hi,
I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 20 16:03:55 localhost kernel:
Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 20 16:03:55 localhost kernel:
Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
I remounted the filesystem again, and tried to delete the bad files, but delete failed.
Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
Dec 20 16:12:08 localhost kernel:
Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
I tried a third remount, but failed. The server was down, and restarted.
Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
Dec 20 16:12:42 localhost kernel:
Yes, it is bad. The remount solves the trouble earlier.
As a result, do you have NILFS2 volume mounted as read-only?
Could you share more details about your environment? It needs for
understanding situation and trying to reproduce. I need to know:
1. Linux kernel version.
2. nilfs-utils version.
3. "mount" output.
4. "df -h" output.
5. "lscp" output.
6. "lssu" output.
7. "nilfs-tune -l" output (superblock content)
I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early
stage of development. The v4 is a fsck.nilfs2 patchset version. You can
try fsck.nilfs2 after applying this patchset on source code of
nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks
and segment summary headers and can't recover completely. So, I think
that it will be useless for you.
With the best regards,
Vyacheslav Dubeyko.
在 2012-12-20,14:08,Vyacheslav Dubeyko <slava@xxxxxxxxxxx> 写道:
Hi,
On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
Hello.
My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
……………………………………………………
How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
Yes, this issue was reported earlier. As I understand, you can simply
remount your filesystem in read-write mode and to continue using your
NILFS2 filesystem.
If you will encounter any troubles with remounting, please, report about
it.
With the best regards,
Vyacheslav Dubeyko.
Elmer Zhang
--
Seiji Kihara
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html