On Sat, Aug 11, 2018 at 1:21 PM Amit Handa <amit.handa@xxxxxxxxx> wrote: > > Thanks for the response, gregory. > > We need to support a couple of production services we have migrated to ceph. So we are in a bit of soup. > > cluster is as follows: > ``` > ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 11.06848 root default > -7 5.45799 host master > 5 hdd 5.45799 osd.5 up 1.00000 1.00000 > -5 1.81940 host node2 > 7 hdd 1.81940 osd.7 up 1.00000 1.00000 > -3 1.81940 host node3 > 8 hdd 1.81940 osd.8 up 1.00000 1.00000 > -9 1.81940 host node4 > 6 hdd 1.81940 osd.6 up 1.00000 1.00000 > -11 0.15230 host node5 > 9 hdd 0.15230 osd.9 up 1.00000 1.00000 > ``` > > We have installed ceph cluster and kubernetes cluster on the same nodes (centos 7). > We were facing low perf from ceph cluster ~10.5MB/S ```dd if=/dev/zero | of=./here bs=1M count=1024 oflag=direct``` > So, we were in the process of adding additional NIC to each node. rebooting each one by one, ensuring rebooted node works well and proceeding further. > After every few(a couple) reboots of nodes, mds would go down. (report data damage). > We would following the disaster recovery link and it ll be merry. > > a couple of days since, mds hasnt come up. disaster recovery doesnt work no more. > try following step 1. umount all cephfs client first (kill ceph-fuse, umount -f kernel mount) 2. start mds, run 'ceph daemon mds.x journal' 3. stop mds, 4. run "cephfs-data-scan scan_links" 5. use "cephfs-table-tool cephfs:0 take_inos ..." to take some free inode numbers (10k should be enough) > cluster conf: > ``` > [global] > fsid = 2ed909ef-e3d7-4081-b01a-d04d12a1155d > mon_initial_members = master, node3, node2 > mon_host = 10.10.73.45,10.10.73.44,10.10.73.43 > auth cluster required = cephx > auth service required = cephx > auth client required = cephx > > public_network= 10.10.73.0/24 > osd pool default size = 2 # Write an object 3 times. > osd pool default min size = 2 > mon allow pool delete = true > > cluster network = 10.10.73.0/24 > > max open files = 131072 > > [mon] > mon data = /var/lib/ceph/mon/ceph-$id > > [osd] > osd data = /var/lib/ceph/osd/ceph-$id > osd journal size = 20000 > osd mkfs type = xfs > osd mkfs options xfs = -f > > filestore xattr use omap = true > filestore min sync interval = 10 > filestore max sync interval = 15 > filestore queue max ops = 25000 > filestore queue max bytes = 10485760 > filestore queue committing max ops = 5000 > filestore queue committing max bytes = 10485760000 > > journal max write bytes = 1073714824 > journal max write entries = 10000 > journal queue max ops = 50000 > journal queue max bytes = 10485760000 > > osd max write size = 512 > osd client message size cap = 2147483648 > osd deep scrub stride = 131072 > osd op threads = 8 > osd disk threads = 4 > osd map cache size = 1024 > osd map cache bl size = 128 > mon allow pool delete = true > > cluster network = 10.10.73.0/24 > > max open files = 131072 > > [mon] > mon data = /var/lib/ceph/mon/ceph-$id > > [osd] > osd data = /var/lib/ceph/osd/ceph-$id > osd journal size = 20000 > osd mkfs type = xfs > osd mkfs options xfs = -f > > filestore xattr use omap = true > filestore min sync interval = 10 > filestore max sync interval = 15 > filestore queue max ops = 25000 > filestore queue max bytes = 10485760 > filestore queue committing max ops = 5000 > filestore queue committing max bytes = 10485760000 > > journal max write bytes = 1073714824 > journal max write entries = 10000 > journal queue max ops = 50000 > journal queue max bytes = 10485760000 > > osd max write size = 512 > osd client message size cap = 2147483648 > osd deep scrub stride = 131072 > osd op threads = 8 > osd disk threads = 4 > osd map cache size = 1024 > osd map cache bl size = 128 > osd mount options xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier" > osd recovery op priority = 4 > osd recovery max active = 10 > osd max backfills = 4 > osd skip data digest = true > > [client] > rbd cache = true > rbd cache size = 268435456 > rbd cache max dirty = 134217728 > rbd cache max dirty age = 5 > ``` > > ceph health: > ``` > master@~/ ceph -s > cluster: > id: 2ed909ef-e3d7-4081-b01a-d04d12a1155d > health: HEALTH_ERR > 4 scrub errors > Possible data damage: 1 pg inconsistent > > services: > mon: 3 daemons, quorum node2,node3,master > mgr: master(active) > mds: cephfs-1/1/1 up {0=master=up:active(laggy or crashed)} > osd: 5 osds: 5 up, 5 in > > data: > pools: 2 pools, 300 pgs > objects: 194.1 k objects, 33 GiB > usage: 131 GiB used, 11 TiB / 11 TiB avail > pgs: 299 active+clean > 1 active+clean+inconsistent > ``` > > ceph health detail > ``` > ceph health detail > HEALTH_ERR 4 scrub errors; Possible data damage: 1 pg inconsistent > OSD_SCRUB_ERRORS 4 scrub errors > PG_DAMAGED Possible data damage: 1 pg inconsistent > pg 1.43 is active+clean+inconsistent, acting [5,8,7] > ``` > > mds logs have already been provided. Sincerely appreciate reading through it all. > > Thanks, > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com