Re: XFS Metadata corruption while activating OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



from what i've heard, xfs has problems on arm. use btrfs, or (i
believe?) ext4+bluestore will work.

On Sun, Mar 11, 2018 at 9:49 PM, Christian Wuerdig
<christian.wuerdig@xxxxxxxxx> wrote:
> Hm, so you're running OSD nodes with 2GB of RAM and 2x10TB = 20TB of
> storage? Literally everything posted on this list in relation to HW
> requirements and related problems will tell you that this simply isn't going
> to work. The slightest hint of a problem will simply kill the OSD nodes with
> OOM. Have you tried with smaller disks - like 1TB models (or even smaller
> like 256GB SSDs) and see if the same problem persists?
>
>
> On Tue, 6 Mar 2018 at 10:51, 赵赵贺东 <zhaohedong@xxxxxxxxx> wrote:
>>
>> Hello ceph-users,
>>
>> It is a really really Really tough problem for our team.
>> We investigated in the problem for a long time, try a lot of efforts, but
>> can’t solve the problem, even the concentrate cause of the problem is still
>> unclear for us!
>> So, Anyone give any solution/suggestion/opinion whatever  will be highly
>> highly appreciated!!!
>>
>> Problem Summary:
>> When we activate osd, there will be  metadata corrupttion in the
>> activating disk, probability is 100% !
>>
>> Admin Nodes&MON node:
>> Platform: X86
>> OS: Ubuntu 16.04
>> Kernel: 4.12.0
>> Ceph: Luminous 12.2.2
>>
>> OSD nodes:
>> Platform: armv7
>> OS:       Ubuntu 14.04
>> Kernel:   4.4.39
>> Ceph: Lominous 12.2.2
>> Disk: 10T+10T
>> Memory: 2GB
>>
>> Deploy log:
>>
>>
>> dmesg log:(Sorry arms001-01 dmesg log has log has been lost, but error
>> message about metadata corruption on arms003-10 are the same with
>> arms001-01)
>> Mar  5 11:08:49 arms003-10 kernel: [  252.534232] XFS (sda1): Unmount and
>> run xfs_repair
>> Mar  5 11:08:49 arms003-10 kernel: [  252.539100] XFS (sda1): First 64
>> bytes of corrupted metadata buffer:
>> Mar  5 11:08:49 arms003-10 kernel: [  252.545504] eb82f000: 58 46 53 42 00
>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.........s..
>> Mar  5 11:08:49 arms003-10 kernel: [  252.553569] eb82f010: 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00  ................
>> Mar  5 11:08:49 arms003-10 kernel: [  252.561624] eb82f020: fc 4e e3 89 50
>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.....n../
>> Mar  5 11:08:49 arms003-10 kernel: [  252.569706] eb82f030: 00 00 00 00 80
>> 00 00 07 ff ff ff ff ff ff ff ff  ................
>> Mar  5 11:08:49 arms003-10 kernel: [  252.577778] XFS (sda1): metadata I/O
>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>> Mar  5 11:08:49 arms003-10 kernel: [  252.602944] XFS (sda1): Metadata
>> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
>> block 0x48b9ff80
>> Mar  5 11:08:49 arms003-10 kernel: [  252.614170] XFS (sda1): Unmount and
>> run xfs_repair
>> Mar  5 11:08:49 arms003-10 kernel: [  252.619030] XFS (sda1): First 64
>> bytes of corrupted metadata buffer:
>> Mar  5 11:08:49 arms003-10 kernel: [  252.625403] eb901000: 58 46 53 42 00
>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.........s..
>> Mar  5 11:08:49 arms003-10 kernel: [  252.633441] eb901010: 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00  ................
>> Mar  5 11:08:49 arms003-10 kernel: [  252.641474] eb901020: fc 4e e3 89 50
>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.....n../
>> Mar  5 11:08:49 arms003-10 kernel: [  252.649519] eb901030: 00 00 00 00 80
>> 00 00 07 ff ff ff ff ff ff ff ff  ................
>> Mar  5 11:08:49 arms003-10 kernel: [  252.657554] XFS (sda1): metadata I/O
>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>> Mar  5 11:08:49 arms003-10 kernel: [  252.675056] XFS (sda1): Metadata
>> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
>> block 0x48b9ff80
>> Mar  5 11:08:49 arms003-10 kernel: [  252.686228] XFS (sda1): Unmount and
>> run xfs_repair
>> Mar  5 11:08:49 arms003-10 kernel: [  252.691054] XFS (sda1): First 64
>> bytes of corrupted metadata buffer:
>> Mar  5 11:08:49 arms003-10 kernel: [  252.697425] eb901000: 58 46 53 42 00
>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.........s..
>> Mar  5 11:08:49 arms003-10 kernel: [  252.705459] eb901010: 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00  ................
>> Mar  5 11:08:49 arms003-10 kernel: [  252.713489] eb901020: fc 4e e3 89 50
>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.....n../
>> Mar  5 11:08:49 arms003-10 kernel: [  252.721520] eb901030: 00 00 00 00 80
>> 00 00 07 ff ff ff ff ff ff ff ff  ................
>> Mar  5 11:08:49 arms003-10 kernel: [  252.729558] XFS (sda1): metadata I/O
>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>> Mar  5 11:08:49 arms003-10 kernel: [  252.741953] XFS (sda1): Metadata
>> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
>> block 0x48b9ff80
>> Mar  5 11:08:49 arms003-10 kernel: [  252.753139] XFS (sda1): Unmount and
>> run xfs_repair
>> Mar  5 11:08:49 arms003-10 kernel: [  252.757955] XFS (sda1): First 64
>> bytes of corrupted metadata buffer:
>> Mar  5 11:08:49 arms003-10 kernel: [  252.764336] eb901000: 58 46 53 42 00
>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.........s..
>> Mar  5 11:08:49 arms003-10 kernel: [  252.772365] eb901010: 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00  ................
>> Mar  5 11:08:49 arms003-10 kernel: [  252.780395] eb901020: fc 4e e3 89 50
>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.....n../
>> Mar  5 11:08:49 arms003-10 kernel: [  252.788417] eb901030: 00 00 00 00 80
>> 00 00 07 ff ff ff ff ff ff ff ff  ................
>> Mar  5 11:08:49 arms003-10 kernel: [  252.796514] XFS (sda1): metadata I/O
>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>>
>> Our tries for solving the problem:
>> 1.Delploy osd manually, still got the same error has been confirmed.
>> 2.Browse kernel bug fix log, but no related bug fix log has been found
>> since kernel 4.4.39.
>> 3.Upgrade xfsprogs from 3.1.9 to 4.15.0, error number changed, still but
>> disk will be corrupted while activating osd!
>>
>> [2912641.987937] XFS (sda1): Metadata CRC error detected at
>> xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data block 0xfffffff0
>> [2912641.999203] XFS (sda1): Unmount and run xfs_repair
>> [2912642.004202] XFS (sda1): First 64 bytes of corrupted metadata buffer:
>> [2912642.010759] e689a000: 58 46 53 42 00 00 10 00 00 00 00 00 91 73 fe fb
>> XFSB.........s..
>> [2912642.018958] e689a010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> ................
>> [2912642.027177] e689a020: 61 7b 64 0d fa fe 41 14 bf ea 90 32 6c 73 e5 ad
>> a{d...A....2ls..
>> [2912642.035388] e689a030: 00 00 00 00 50 00 00 08 ff ff ff ff ff ff ff ff
>> ....P...........
>> [2912642.043630] XFS (sda1): metadata I/O error: block 0xfffffff0
>> ("xfs_trans_read_buf_map") error 74 numblks 8
>> [2912642.060390] XFS (sda1): Metadata CRC error detected at
>> xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data block 0xfffffff0
>> [2912642.071673] XFS (sda1): Unmount and run xfs_repair
>>
>> 4.Use the disk as OSD node on X86 will not trigger the the problem has
>> been confirmed.
>> 5.Use sgdisk & mkfs.xfs to format the disk, and mount do some read&write
>> dd test then unmount, will not trigger the problem has been confirmed.
>> 6.Chang ceph version form 12.2.2 to 10.2.10, the problem still exist has
>> been confirmed.
>>    10.2.0 Deploy log
>>
>>    Corruption  error is the same as 12.2.2.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux