Re: NILFS: bad btree node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64 
2. nilfs-utils version: nilfs-utils-2.1.4
3. "mount" output:
/dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909) 

4. "df -h" output:
/dev/sdb2 9.6T 5.9T 3.2T 66% /data0 

5. "lscp" output:
                 CNO        DATE     TIME  MODE  FLG     NBLKINC       ICNT
                   2  2012-12-03 14:03:01   ss    -           14          3
              580481  2012-12-20 16:11:25   cp    -          293     697667
              580482  2012-12-20 16:11:25   cp    -          130     697666
              580483  2012-12-20 16:11:25   cp    -          225     697664
              580484  2012-12-20 16:11:25   cp    -          143     697663
              580485  2012-12-20 16:11:26   cp    -          311     697659
              580486  2012-12-20 16:11:27   cp    -          328     697657
              580487  2012-12-20 16:11:27   cp    -          263     697655
              580488  2012-12-20 16:11:27   cp    -          118     697653
              580489  2012-12-20 16:11:28   cp    -          230     697651
              580490  2012-12-20 16:11:28   cp    -          272     697649
              580491  2012-12-20 16:11:28   cp    -          148     697648
              580492  2012-12-20 16:11:29   cp    -          139     697647
              580493  2012-12-20 16:11:29   cp    -          273     697645
              580494  2012-12-20 16:11:29   cp    -          147     697644
              580495  2012-12-20 16:11:30   cp    -          271     697641
              580496  2012-12-20 16:11:31   cp    -          526     697636
              580497  2012-12-20 16:11:34   cp    -         1684     697625
              580498  2012-12-20 16:11:37   cp    -          983     697609
              580499  2012-12-20 16:11:38   cp    -          421     697605
              580500  2012-12-20 16:11:40   cp    -         1019     697594
              580501  2012-12-20 16:11:40   cp    -          143     697593
              580502  2012-12-20 16:11:41   cp    -         1536     697592
              580503  2012-12-20 16:11:41   cp    -          373     697590
              580504  2012-12-20 16:11:42   cp    -          312     697587
              580505  2012-12-20 16:11:42   cp    -          102     697586
              580506  2012-12-20 16:11:43   cp    -          274     697584
              580507  2012-12-20 16:11:43   cp    -          270     697582
              580508  2012-12-20 16:11:43   cp    -          118     697581
              580509  2012-12-20 16:11:43   cp    -          133     697580
              580510  2012-12-20 16:11:44   cp    -          321     697578
              580511  2012-12-20 16:11:44   cp    -          245     697576
              580512  2012-12-20 16:11:45   cp    -          394     697573
              580513  2012-12-20 16:11:45   cp    -          121     697572
              580514  2012-12-20 16:11:45   cp    -          245     697569
              580515  2012-12-20 16:11:52   cp    -         2705     697543
              580516  2012-12-20 16:11:55   cp    -         2590     697504
              580517  2012-12-20 16:11:59   cp    -         2418     697453
              580518  2012-12-20 16:12:00   cp    -          866     697436
              580519  2012-12-20 16:12:01   cp    -          864     697420
              580520  2012-12-20 16:12:05   cp    -         1765     697357
              580521  2012-12-20 16:12:05   cp    -          120     697356
              580522  2012-12-20 16:12:06   cp    -          820     697332
              580523  2012-12-20 16:12:09   cp    -         1642     697174
              580524  2012-12-20 16:12:09   cp    -           89     697173
              580525  2012-12-20 16:12:10   cp    -           56     697173
              580526  2012-12-20 16:12:42   cp    -          763     697173

6. "lssu" output:
	it's too large, please download it: http://d.pr/f/vnoR

7. "nilfs-tune -l" output (superblock content):

nilfs-tune 2.1.4
Filesystem volume name:	  (none)
Filesystem UUID:	  dcfb7152-a342-48d0-a712-212a3062395e
Filesystem magic number:  0x3434
Filesystem revision #:	  2.0
Filesystem features:      (none)
Filesystem state:	  invalid or mounted,error
Filesystem OS type:	  Linux
Block size:		  4096
Filesystem created:	  Mon Dec  3 13:56:51 2012
Last mount time:	  Thu Dec 20 17:44:03 2012
Last write time:	  Thu Dec 20 17:44:03 2012
Mount count:		  13
Maximum mount count:	  50
Reserve blocks uid:	  0 (user root)
Reserve blocks gid:	  0 (group root)
First inode:		  11
Inode size:		  128
DAT entry size:		  32
Checkpoint size:	  192
Segment usage size:	  16
Number of segments:	  1246464
Device size:		  10456104173568
First data block:	  1
# of blocks per segment:  2048
Reserved segments %:	  5
Last checkpoint #:	  580526
Last block address:	  1040286376
Last sequence #:	  1753809
Free blocks count:	  973875200
Commit interval:	  60
# of blks to create seg:  0
CRC seed:		  0x3adfb6c3
CRC check sum:		  0x8468fbbf
CRC check data size:	  0x00000118


I found this in /var/log/messages, perhaps it is related to the bad bree node:

Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
Dec 18 15:55:02 localhost kernel: Call Trace:
Dec 18 15:55:02 localhost kernel: <IRQ>  [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
Dec 18 15:55:02 localhost kernel: <EOI>  [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b



在 2012-12-20,17:38,Vyacheslav Dubeyko <slava@xxxxxxxxxxx> 写道:

> On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote:
>> Hi,
>> 
>> I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
>> 
>> Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>> Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
>> Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 20 16:03:55 localhost kernel:
>> Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 20 16:03:55 localhost kernel:
>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
>> 
>> I remounted the filesystem again, and tried to delete the bad files, but delete failed.
>> 
>> Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>> Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
>> Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
>> Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
>> Dec 20 16:12:08 localhost kernel:
>> Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
>> Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
>> 
>> I tried a third remount, but failed. The server was down, and restarted.
>> 
>> Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
>> Dec 20 16:12:42 localhost kernel:
>> 
> 
> Yes, it is bad. The remount solves the trouble earlier.
> 
> As a result, do you have NILFS2 volume mounted as read-only?
> 
> Could you share more details about your environment? It needs for
> understanding situation and trying to reproduce. I need to know:
> 1. Linux kernel version.
> 2. nilfs-utils version.
> 3. "mount" output.
> 4. "df -h" output.
> 5. "lscp" output.
> 6. "lssu" output.
> 7. "nilfs-tune -l" output (superblock content)
> 
>> I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
>> 
> 
> Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early
> stage of development. The v4 is a fsck.nilfs2 patchset version. You can
> try fsck.nilfs2 after applying this patchset on source code of
> nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks
> and segment summary headers and can't recover completely. So, I think
> that it will be useless for you.
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
>> 在 2012-12-20,14:08,Vyacheslav Dubeyko <slava@xxxxxxxxxxx> 写道:
>> 
>>> Hi,
>>> 
>>> On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
>>>> Hello.
>>>> 	My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
>>>> 
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> ……………………………………………………
>>>> 
>>>> 	How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
>>>> 	I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
>>>> 
>>> 
>>> Yes, this issue was reported earlier. As I understand, you can simply
>>> remount your filesystem in read-write mode and to continue using your
>>> NILFS2 filesystem.
>>> 
>>> If you will encounter any troubles with remounting, please, report about
>>> it.
>>> 
>>> With the best regards,
>>> Vyacheslav Dubeyko.
>>> 
>>> 
>>>> Elmer Zhang--
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux BTRFS]     [Linux CIFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux