Re: XFS Metadata corruption while activating OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I’m sorry for my late reply.
Thank you for your reply.
Yes, this error only exists while backend is xfs.
Ext4&bluestore will not trigger the error.



> 在 2018年3月12日,下午6:31,Peter Woodman <peter@xxxxxxxxxxxx> 写道:
> 
> from what i've heard, xfs has problems on arm. use btrfs, or (i
> believe?) ext4+bluestore will work.
> 
> On Sun, Mar 11, 2018 at 9:49 PM, Christian Wuerdig
> <christian.wuerdig@xxxxxxxxx> wrote:
>> Hm, so you're running OSD nodes with 2GB of RAM and 2x10TB = 20TB of
>> storage? Literally everything posted on this list in relation to HW
>> requirements and related problems will tell you that this simply isn't going
>> to work. The slightest hint of a problem will simply kill the OSD nodes with
>> OOM. Have you tried with smaller disks - like 1TB models (or even smaller
>> like 256GB SSDs) and see if the same problem persists?
>> 
>> 
>> On Tue, 6 Mar 2018 at 10:51, 赵赵贺东 <zhaohedong@xxxxxxxxx> wrote:
>>> 
>>> Hello ceph-users,
>>> 
>>> It is a really really Really tough problem for our team.
>>> We investigated in the problem for a long time, try a lot of efforts, but
>>> can’t solve the problem, even the concentrate cause of the problem is still
>>> unclear for us!
>>> So, Anyone give any solution/suggestion/opinion whatever  will be highly
>>> highly appreciated!!!
>>> 
>>> Problem Summary:
>>> When we activate osd, there will be  metadata corrupttion in the
>>> activating disk, probability is 100% !
>>> 
>>> Admin Nodes&MON node:
>>> Platform: X86
>>> OS: Ubuntu 16.04
>>> Kernel: 4.12.0
>>> Ceph: Luminous 12.2.2
>>> 
>>> OSD nodes:
>>> Platform: armv7
>>> OS:       Ubuntu 14.04
>>> Kernel:   4.4.39
>>> Ceph: Lominous 12.2.2
>>> Disk: 10T+10T
>>> Memory: 2GB
>>> 
>>> Deploy log:
>>> 
>>> 
>>> dmesg log:(Sorry arms001-01 dmesg log has log has been lost, but error
>>> message about metadata corruption on arms003-10 are the same with
>>> arms001-01)
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.534232] XFS (sda1): Unmount and
>>> run xfs_repair
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.539100] XFS (sda1): First 64
>>> bytes of corrupted metadata buffer:
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.545504] eb82f000: 58 46 53 42 00
>>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.........s..
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.553569] eb82f010: 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00  ................
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.561624] eb82f020: fc 4e e3 89 50
>>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.....n../
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.569706] eb82f030: 00 00 00 00 80
>>> 00 00 07 ff ff ff ff ff ff ff ff  ................
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.577778] XFS (sda1): metadata I/O
>>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.602944] XFS (sda1): Metadata
>>> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
>>> block 0x48b9ff80
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.614170] XFS (sda1): Unmount and
>>> run xfs_repair
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.619030] XFS (sda1): First 64
>>> bytes of corrupted metadata buffer:
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.625403] eb901000: 58 46 53 42 00
>>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.........s..
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.633441] eb901010: 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00  ................
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.641474] eb901020: fc 4e e3 89 50
>>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.....n../
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.649519] eb901030: 00 00 00 00 80
>>> 00 00 07 ff ff ff ff ff ff ff ff  ................
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.657554] XFS (sda1): metadata I/O
>>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.675056] XFS (sda1): Metadata
>>> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
>>> block 0x48b9ff80
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.686228] XFS (sda1): Unmount and
>>> run xfs_repair
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.691054] XFS (sda1): First 64
>>> bytes of corrupted metadata buffer:
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.697425] eb901000: 58 46 53 42 00
>>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.........s..
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.705459] eb901010: 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00  ................
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.713489] eb901020: fc 4e e3 89 50
>>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.....n../
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.721520] eb901030: 00 00 00 00 80
>>> 00 00 07 ff ff ff ff ff ff ff ff  ................
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.729558] XFS (sda1): metadata I/O
>>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.741953] XFS (sda1): Metadata
>>> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
>>> block 0x48b9ff80
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.753139] XFS (sda1): Unmount and
>>> run xfs_repair
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.757955] XFS (sda1): First 64
>>> bytes of corrupted metadata buffer:
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.764336] eb901000: 58 46 53 42 00
>>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.........s..
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.772365] eb901010: 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00  ................
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.780395] eb901020: fc 4e e3 89 50
>>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.....n../
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.788417] eb901030: 00 00 00 00 80
>>> 00 00 07 ff ff ff ff ff ff ff ff  ................
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.796514] XFS (sda1): metadata I/O
>>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>>> 
>>> Our tries for solving the problem:
>>> 1.Delploy osd manually, still got the same error has been confirmed.
>>> 2.Browse kernel bug fix log, but no related bug fix log has been found
>>> since kernel 4.4.39.
>>> 3.Upgrade xfsprogs from 3.1.9 to 4.15.0, error number changed, still but
>>> disk will be corrupted while activating osd!
>>> 
>>> [2912641.987937] XFS (sda1): Metadata CRC error detected at
>>> xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data block 0xfffffff0
>>> [2912641.999203] XFS (sda1): Unmount and run xfs_repair
>>> [2912642.004202] XFS (sda1): First 64 bytes of corrupted metadata buffer:
>>> [2912642.010759] e689a000: 58 46 53 42 00 00 10 00 00 00 00 00 91 73 fe fb
>>> XFSB.........s..
>>> [2912642.018958] e689a010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> ................
>>> [2912642.027177] e689a020: 61 7b 64 0d fa fe 41 14 bf ea 90 32 6c 73 e5 ad
>>> a{d...A....2ls..
>>> [2912642.035388] e689a030: 00 00 00 00 50 00 00 08 ff ff ff ff ff ff ff ff
>>> ....P...........
>>> [2912642.043630] XFS (sda1): metadata I/O error: block 0xfffffff0
>>> ("xfs_trans_read_buf_map") error 74 numblks 8
>>> [2912642.060390] XFS (sda1): Metadata CRC error detected at
>>> xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data block 0xfffffff0
>>> [2912642.071673] XFS (sda1): Unmount and run xfs_repair
>>> 
>>> 4.Use the disk as OSD node on X86 will not trigger the the problem has
>>> been confirmed.
>>> 5.Use sgdisk & mkfs.xfs to format the disk, and mount do some read&write
>>> dd test then unmount, will not trigger the problem has been confirmed.
>>> 6.Chang ceph version form 12.2.2 to 10.2.10, the problem still exist has
>>> been confirmed.
>>>   10.2.0 Deploy log
>>> 
>>>   Corruption  error is the same as 12.2.2.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux