Re: A lot of NILFS: bad btree node messages (readonly fs)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2013-1-8,20:52,Vyacheslav Dubeyko <slava@xxxxxxxxxxx> 写道:

> Hi guys,
> 
> I am trying to reproduce the issue last three days but without success. I tried different workloads and different environments. As I know all of you have the issue in reproduced state. So I have additional questions.
> 
> 1. All of you have such messages:
> 
> Jan 03 22:36:38 [kernel] [  953.289973] NILFS: bad btree node (blocknr=26229286): level = 67, flags = 0xee, nchildren = 40
> Jan 03 22:36:38 [kernel] [  953.289976] NILFS error (device sda2): nilfs_bmap_lookup_contig: broken bmap (inode number=102230)
> 
> As I understand, you still have message for concrete block number (for example, blocknr=26229286) during remount. But you haven't the message for this block number (for example, blocknr=26229286) after umount and mount again. But you can get error messages for another block number after it. Am I correct?

I got error messages for the same block after remount. That is to say the bad block is always bad.

> 2. As I understand, you have corrupted file on your volume after such error message (for example, for inode number=102230).
> 

Yes. Once I open the corrupted file and read, kernel will report bad btree node and remount filesystem read-only.

> 在 2013-1-6,12:46,Elmer Zhang <freeboy6716@xxxxxxxxx> 写道:
> 
>> I have found the corrupted file using inode number:
>> [root@yf237 data0]# cat mysql6003/app_wyxgrab/weibo_rank.MYI > /dev/null 
>> cat: mysql6003/app_wyxgrab/weibo_rank.MYI: Input/output error
> 
> Could you share strace output for "cat" command for such corrupted file? Maybe syslog can contain some interesting details during execution of "cat" command. Could you check syslog for interesting error messages during such try?
> 

output of strace for cat: http://d.pr/n/Qboc
error messages during cat: http://d.pr/n/snOt

> 3. Could you share configuration file of your kernel (.config)? I suspect that you can have some special configuration of your environment that I haven't.
> 

content of /boot/config-2.6.32-220.13.1.el6.x86_64 : http://d.pr/n/qTQk

> 4. Could you share content of nilfs_cleanerd.conf file for NILFS2 partition that has such issue? Sorry, if I ask about it again.
> 

content of nilfs_cleanerd.conf: http://d.pr/n/YIwj

> 5. Did you have any sudden power-off before you encounter the issue firstly?
> 

No.

> 6. I understand that it can be not so easy. But, anyway, could you share details of your system log for the case of first case of the issue occurrence? I need only details about how live system before the issue.
> 

I found some backtrace in syslog: http://d.pr/n/ddZd

> 7. I analyzed the raw dump of segment that I received from Elmer Zhang. Currently, I have such feeling that it takes place situation when driver tries to take block that was filled by GC yet. But it needs to investigate the issue more deeply. And, currently, I don't understand how the issue can be achieved. Successful reproducing of the issue is a half of the success.
> 
> Thanks,
> Vyacheslav Dubeyko.
> 

---
Elmer Zhang

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux BTRFS]     [Linux CIFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux