Re: ceph mds crashing constantly : ceph_assert fail … prepare_new_inode

Amit Handa <amit.handa@xxxxxxxxx> · Sat, 11 Aug 2018 10:51:08 +0530

Thanks for the response, gregory. 

We need to support a couple of production services we have migrated to ceph. So we are in a bit of soup.

cluster is as follows:
```
ceph osd tree
ID  CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF 
 -1       11.06848 root default                            
 -7        5.45799     host master                         
  5   hdd  5.45799         osd.5       up  1.00000 1.00000 
 -5        1.81940     host node2                          
  7   hdd  1.81940         osd.7       up  1.00000 1.00000 
 -3        1.81940     host node3                          
  8   hdd  1.81940         osd.8       up  1.00000 1.00000 
 -9        1.81940     host node4                          
  6   hdd  1.81940         osd.6       up  1.00000 1.00000 
-11        0.15230     host node5                          
  9   hdd  0.15230         osd.9       up  1.00000 1.00000 
```

We have installed ceph cluster and kubernetes cluster on the same nodes (centos 7).
We were facing low perf from ceph cluster ~10.5MB/S ```dd if=/dev/zero | of=./here bs=1M count=1024 oflag=direct```
So, we were in the process of adding additional NIC to each node. rebooting each one by one, ensuring rebooted node works well and proceeding further.
After every few(a couple) reboots of nodes, mds would go down. (report data damage).
We would following the disaster recovery link and it ll be merry.

a couple of days since, mds hasnt come up. disaster recovery doesnt work no more.

cluster conf:
```
[global]
fsid = 2ed909ef-e3d7-4081-b01a-d04d12a1155d
mon_initial_members = master, node3, node2
mon_host = 10.10.73.45,10.10.73.44,10.10.73.43
auth cluster required = cephx
auth service required = cephx
auth client required = cephx

public_network= 10.10.73.0/24
osd pool default size = 2  # Write an object 3 times.
osd pool default min size = 2
mon allow pool delete = true

cluster network = 10.10.73.0/24

max open files = 131072

[mon]
mon data = "">
[osd]
osd data = "">osd journal size = 20000
osd mkfs type = xfs
osd mkfs options xfs = -f

filestore xattr use omap = true
filestore min sync interval = 10
filestore max sync interval = 15
filestore queue max ops = 25000
filestore queue max bytes = 10485760
filestore queue committing max ops = 5000
filestore queue committing max bytes = 10485760000

journal max write bytes = 1073714824
journal max write entries = 10000
journal queue max ops = 50000
journal queue max bytes = 10485760000

osd max write size = 512
osd client message size cap = 2147483648
osd deep scrub stride = 131072
osd op threads = 8
osd disk threads = 4
osd map cache size = 1024
osd map cache bl size = 128
mon allow pool delete = true

cluster network = 10.10.73.0/24

max open files = 131072

[mon]
mon data = "">
[osd]
osd data = "">osd journal size = 20000
osd mkfs type = xfs
osd mkfs options xfs = -f

filestore xattr use omap = true
filestore min sync interval = 10
filestore max sync interval = 15
filestore queue max ops = 25000
filestore queue max bytes = 10485760
filestore queue committing max ops = 5000
filestore queue committing max bytes = 10485760000

journal max write bytes = 1073714824
journal max write entries = 10000
journal queue max ops = 50000
journal queue max bytes = 10485760000

osd max write size = 512
osd client message size cap = 2147483648
osd deep scrub stride = 131072
osd op threads = 8
osd disk threads = 4
osd map cache size = 1024
osd map cache bl size = 128
osd mount options xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"
osd recovery op priority = 4
osd recovery max active = 10
osd max backfills = 4
osd skip data digest = true

[client]
rbd cache = true
rbd cache size = 268435456
rbd cache max dirty = 134217728
rbd cache max dirty age = 5
```

ceph health:
```
master@~/ ceph -s
  cluster:
    id:     2ed909ef-e3d7-4081-b01a-d04d12a1155d
    health: HEALTH_ERR
            4 scrub errors
            Possible data damage: 1 pg inconsistent

  services:
    mon: 3 daemons, quorum node2,node3,master
    mgr: master(active)
    mds: cephfs-1/1/1 up  {0=master=up:active(laggy or crashed)}
    osd: 5 osds: 5 up, 5 in

  data:
    pools:   2 pools, 300 pgs
    objects: 194.1 k objects, 33 GiB
    usage:   131 GiB used, 11 TiB / 11 TiB avail
    pgs:     299 active+clean
             1   active+clean+inconsistent
```

ceph health detail
```
ceph health detail
HEALTH_ERR 4 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 4 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
    pg 1.43 is active+clean+inconsistent, acting [5,8,7]
```

mds logs have already been provided. Sincerely appreciate reading through it all.

Thanks,

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com