Re: NILFS: bad btree node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, I use kmod-nilfs2-0.4.3-1.el6.x86_64

在 2012-12-22,22:12,Seiji Kihara <kihara@xxxxxxxx> 写道:

> Hello,
> 
> (2012/12/20 19:16), 张 磊 wrote:
>> 1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64
> 
> If you use nilfs2 kernel module for RHEL 6 clones,
> 'rpm -q kmod-nilfs2' will help.
> 
> http://www.nilfs.org/en/pkg_centos.html
> https://github.com/nilfs-dev/nilfs2-kmod-centos6
> 
> Regards,
> 
> Seiji
> 
>> 2. nilfs-utils version: nilfs-utils-2.1.4
>> 3. "mount" output:
>> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909)
>> 
>> 4. "df -h" output:
>> /dev/sdb2 9.6T 5.9T 3.2T 66% /data0
>> 
>> 5. "lscp" output:
>>                  CNO        DATE     TIME  MODE  FLG     NBLKINC       ICNT
>>                    2  2012-12-03 14:03:01   ss    -           14          3
>>               580481  2012-12-20 16:11:25   cp    -          293     697667
>>               580482  2012-12-20 16:11:25   cp    -          130     697666
>>               580483  2012-12-20 16:11:25   cp    -          225     697664
>>               580484  2012-12-20 16:11:25   cp    -          143     697663
>>               580485  2012-12-20 16:11:26   cp    -          311     697659
>>               580486  2012-12-20 16:11:27   cp    -          328     697657
>>               580487  2012-12-20 16:11:27   cp    -          263     697655
>>               580488  2012-12-20 16:11:27   cp    -          118     697653
>>               580489  2012-12-20 16:11:28   cp    -          230     697651
>>               580490  2012-12-20 16:11:28   cp    -          272     697649
>>               580491  2012-12-20 16:11:28   cp    -          148     697648
>>               580492  2012-12-20 16:11:29   cp    -          139     697647
>>               580493  2012-12-20 16:11:29   cp    -          273     697645
>>               580494  2012-12-20 16:11:29   cp    -          147     697644
>>               580495  2012-12-20 16:11:30   cp    -          271     697641
>>               580496  2012-12-20 16:11:31   cp    -          526     697636
>>               580497  2012-12-20 16:11:34   cp    -         1684     697625
>>               580498  2012-12-20 16:11:37   cp    -          983     697609
>>               580499  2012-12-20 16:11:38   cp    -          421     697605
>>               580500  2012-12-20 16:11:40   cp    -         1019     697594
>>               580501  2012-12-20 16:11:40   cp    -          143     697593
>>               580502  2012-12-20 16:11:41   cp    -         1536     697592
>>               580503  2012-12-20 16:11:41   cp    -          373     697590
>>               580504  2012-12-20 16:11:42   cp    -          312     697587
>>               580505  2012-12-20 16:11:42   cp    -          102     697586
>>               580506  2012-12-20 16:11:43   cp    -          274     697584
>>               580507  2012-12-20 16:11:43   cp    -          270     697582
>>               580508  2012-12-20 16:11:43   cp    -          118     697581
>>               580509  2012-12-20 16:11:43   cp    -          133     697580
>>               580510  2012-12-20 16:11:44   cp    -          321     697578
>>               580511  2012-12-20 16:11:44   cp    -          245     697576
>>               580512  2012-12-20 16:11:45   cp    -          394     697573
>>               580513  2012-12-20 16:11:45   cp    -          121     697572
>>               580514  2012-12-20 16:11:45   cp    -          245     697569
>>               580515  2012-12-20 16:11:52   cp    -         2705     697543
>>               580516  2012-12-20 16:11:55   cp    -         2590     697504
>>               580517  2012-12-20 16:11:59   cp    -         2418     697453
>>               580518  2012-12-20 16:12:00   cp    -          866     697436
>>               580519  2012-12-20 16:12:01   cp    -          864     697420
>>               580520  2012-12-20 16:12:05   cp    -         1765     697357
>>               580521  2012-12-20 16:12:05   cp    -          120     697356
>>               580522  2012-12-20 16:12:06   cp    -          820     697332
>>               580523  2012-12-20 16:12:09   cp    -         1642     697174
>>               580524  2012-12-20 16:12:09   cp    -           89     697173
>>               580525  2012-12-20 16:12:10   cp    -           56     697173
>>               580526  2012-12-20 16:12:42   cp    -          763     697173
>> 
>> 6. "lssu" output:
>> 	it's too large, please download it: http://d.pr/f/vnoR
>> 
>> 7. "nilfs-tune -l" output (superblock content):
>> 
>> nilfs-tune 2.1.4
>> Filesystem volume name:	  (none)
>> Filesystem UUID:	  dcfb7152-a342-48d0-a712-212a3062395e
>> Filesystem magic number:  0x3434
>> Filesystem revision #:	  2.0
>> Filesystem features:      (none)
>> Filesystem state:	  invalid or mounted,error
>> Filesystem OS type:	  Linux
>> Block size:		  4096
>> Filesystem created:	  Mon Dec  3 13:56:51 2012
>> Last mount time:	  Thu Dec 20 17:44:03 2012
>> Last write time:	  Thu Dec 20 17:44:03 2012
>> Mount count:		  13
>> Maximum mount count:	  50
>> Reserve blocks uid:	  0 (user root)
>> Reserve blocks gid:	  0 (group root)
>> First inode:		  11
>> Inode size:		  128
>> DAT entry size:		  32
>> Checkpoint size:	  192
>> Segment usage size:	  16
>> Number of segments:	  1246464
>> Device size:		  10456104173568
>> First data block:	  1
>> # of blocks per segment:  2048
>> Reserved segments %:	  5
>> Last checkpoint #:	  580526
>> Last block address:	  1040286376
>> Last sequence #:	  1753809
>> Free blocks count:	  973875200
>> Commit interval:	  60
>> # of blks to create seg:  0
>> CRC seed:		  0x3adfb6c3
>> CRC check sum:		  0x8468fbbf
>> CRC check data size:	  0x00000118
>> 
>> 
>> I found this in /var/log/messages, perhaps it is related to the bad bree node:
>> 
>> Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
>> Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
>> Dec 18 15:55:02 localhost kernel: Call Trace:
>> Dec 18 15:55:02 localhost kernel: <IRQ>  [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
>> Dec 18 15:55:02 localhost kernel: <EOI>  [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
>> 
>> 
>> 
>> 在 2012-12-20,17:38,Vyacheslav Dubeyko <slava@xxxxxxxxxxx> 写道:
>> 
>>> On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote:
>>>> Hi,
>>>> 
>>>> I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
>>>> 
>>>> Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>>>> Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
>>>> Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
>>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 20 16:03:55 localhost kernel:
>>>> Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
>>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 20 16:03:55 localhost kernel:
>>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
>>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
>>>> 
>>>> I remounted the filesystem again, and tried to delete the bad files, but delete failed.
>>>> 
>>>> Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>>>> Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
>>>> Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
>>>> Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
>>>> Dec 20 16:12:08 localhost kernel:
>>>> Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
>>>> Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
>>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
>>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
>>>> 
>>>> I tried a third remount, but failed. The server was down, and restarted.
>>>> 
>>>> Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
>>>> Dec 20 16:12:42 localhost kernel:
>>>> 
>>> Yes, it is bad. The remount solves the trouble earlier.
>>> 
>>> As a result, do you have NILFS2 volume mounted as read-only?
>>> 
>>> Could you share more details about your environment? It needs for
>>> understanding situation and trying to reproduce. I need to know:
>>> 1. Linux kernel version.
>>> 2. nilfs-utils version.
>>> 3. "mount" output.
>>> 4. "df -h" output.
>>> 5. "lscp" output.
>>> 6. "lssu" output.
>>> 7. "nilfs-tune -l" output (superblock content)
>>> 
>>>> I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
>>>> 
>>> Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early
>>> stage of development. The v4 is a fsck.nilfs2 patchset version. You can
>>> try fsck.nilfs2 after applying this patchset on source code of
>>> nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks
>>> and segment summary headers and can't recover completely. So, I think
>>> that it will be useless for you.
>>> 
>>> With the best regards,
>>> Vyacheslav Dubeyko.
>>> 
>>>> 在 2012-12-20,14:08,Vyacheslav Dubeyko <slava@xxxxxxxxxxx> 写道:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
>>>>>> Hello.
>>>>>> 	My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
>>>>>> 
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> ……………………………………………………
>>>>>> 
>>>>>> 	How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
>>>>>> 	I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
>>>>>> 
>>>>> Yes, this issue was reported earlier. As I understand, you can simply
>>>>> remount your filesystem in read-write mode and to continue using your
>>>>> NILFS2 filesystem.
>>>>> 
>>>>> If you will encounter any troubles with remounting, please, report about
>>>>> it.
>>>>> 
>>>>> With the best regards,
>>>>> Vyacheslav Dubeyko.
>>>>> 
>>>>> 
>>>>>> Elmer Zhang
> 
> -- 
> Seiji Kihara
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux BTRFS]     [Linux CIFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux