Hi Marcus, for a fast help you can perhaps increase the mon_osd_full_ratio? What values do you have? Please post the output of (on host ceph1, because osd.0.asok) ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep full_ratio after that it would be helpfull to use on all hosts 2 OSDs... Udo On 01.11.2016 20:14, Marcus Müller wrote: > Hi all, > > i have a big problem and i really hope someone can help me! > > We are running a ceph cluster since a year now. Version is: 0.94.7 > (Hammer) > Here is some info: > > Our osd map is: > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 26.67998 root default > -2 3.64000 host ceph1 > 0 3.64000 osd.0 up 1.00000 1.00000 > -3 3.50000 host ceph2 > 1 3.50000 osd.1 up 1.00000 1.00000 > -4 3.64000 host ceph3 > 2 3.64000 osd.2 up 1.00000 1.00000 > -5 15.89998 host ceph4 > 3 4.00000 osd.3 up 1.00000 1.00000 > 4 3.59999 osd.4 up 1.00000 1.00000 > 5 3.29999 osd.5 up 1.00000 1.00000 > 6 5.00000 osd.6 up 1.00000 1.00000 > > ceph df: > > GLOBAL: > SIZE AVAIL RAW USED %RAW USED > 40972G 26821G 14151G 34.54 > POOLS: > NAME ID USED %USED MAX AVAIL OBJECTS > blocks 7 4490G 10.96 1237G 7037004 > commits 8 473M 0 1237G 802353 > fs 9 9666M 0.02 1237G 7863422 > > ceph osd df: > > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR > 0 3.64000 1.00000 3724G 3128G 595G 84.01 2.43 > 1 3.50000 1.00000 3724G 3237G 487G 86.92 2.52 > 2 3.64000 1.00000 3724G 3180G 543G 85.41 2.47 > 3 4.00000 1.00000 7450G 1616G 5833G 21.70 0.63 > 4 3.59999 1.00000 7450G 1246G 6203G 16.74 0.48 > 5 3.29999 1.00000 7450G 1181G 6268G 15.86 0.46 > 6 5.00000 1.00000 7450G 560G 6889G 7.52 0.22 > TOTAL 40972G 14151G 26820G 34.54 > MIN/MAX VAR: 0.22/2.52 STDDEV: 36.53 > > > Our current cluster state is: > > health HEALTH_WARN > 63 pgs backfill > 8 pgs backfill_toofull > 9 pgs backfilling > 11 pgs degraded > 1 pgs recovering > 10 pgs recovery_wait > 11 pgs stuck degraded > 89 pgs stuck unclean > recovery 8237/52179437 objects degraded (0.016%) > recovery 9620295/52179437 objects misplaced (18.437%) > 2 near full osd(s) > noout,noscrub,nodeep-scrub flag(s) set > monmap e8: 4 mons at > {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0} > election epoch 400, quorum 0,1,2,3 ceph1,ceph2,ceph3,ceph4 > osdmap e1774: 7 osds: 7 up, 7 in; 84 remapped pgs > flags noout,noscrub,nodeep-scrub > pgmap v7316159: 320 pgs, 3 pools, 4501 GB data, 15336 kobjects > 14152 GB used, 26820 GB / 40972 GB avail > 8237/52179437 objects degraded (0.016%) > 9620295/52179437 objects misplaced (18.437%) > 231 active+clean > 61 active+remapped+wait_backfill > 9 active+remapped+backfilling > 6 active+recovery_wait+degraded+remapped > 6 active+remapped+backfill_toofull > 4 active+recovery_wait+degraded > 2 active+remapped+wait_backfill+backfill_toofull > 1 active+recovering+degraded > recovery io 11754 kB/s, 35 objects/s > client io 1748 kB/s rd, 249 kB/s wr, 44 op/s > > > My main problems are: > > - As you can see from the osd tree, we have three separate hosts with > only one osd each. Another one has four osds. Ceph allows me not to > get data back from these three nodes with only one HDD, which are all > near full. I tried to set the weight of the osds in the bigger node > higher but this just does not work. So i added a new osd yesterday > which made things not better, as you can see now. What do i have to do > to just become these three nodes empty again and put more data on the > other node with the four HDDs. > > - I added the „ceph4“ node later, this resulted in a strange ip change > as you can see in the mon list. The public network and the cluster > network were swapped or not assigned right. See ceph.conf > > [global] > fsid = xxx > mon_initial_members = ceph1 > mon_host = 192.168.10.3, 192.168.10.4, 192.168.10.5, 192.168.10.11 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > public_network = 192.168.60.0/24 > cluster_network = 192.168.10.0/24 > osd pool default size = 3 > osd pool default min size = 1 > osd pool default pg num = 128 > osd pool default pgp num = 128 > osd recovery max active = 50 > osd recovery threads = 3 > mon_pg_warn_max_per_osd = 0 > > What can i do in this case (it’s no big problem since the network is > 2x 10 GBE and everything works)? > > - One other thing. Even if i just prepare the osd, it’s automatically > added to the cluster. I can not activate it. Has had someone other > already such behavior? > > I’m now trying to delete something in the cluster, which already > helped a bit: > > health HEALTH_WARN > 63 pgs backfill > 8 pgs backfill_toofull > 10 pgs backfilling > 7 pgs degraded > 3 pgs recovery_wait > 7 pgs stuck degraded > 82 pgs stuck unclean > recovery 6498/52085528 objects degraded (0.012%) > recovery 9507140/52085528 objects misplaced (18.253%) > 2 near full osd(s) > noout,noscrub,nodeep-scrub flag(s) set > monmap e8: 4 mons at > {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0} > election epoch 400, quorum 0,1,2,3 ceph1,ceph2,ceph3,ceph4 > osdmap e1780: 7 osds: 7 up, 7 in; 81 remapped pgs > flags noout,noscrub,nodeep-scrub > pgmap v7317114: 320 pgs, 3 pools, 4499 GB data, 15333 kobjects > 14100 GB used, 26872 GB / 40972 GB avail > 6498/52085528 objects degraded (0.012%) > 9507140/52085528 objects misplaced (18.253%) > 238 active+clean > 60 active+remapped+wait_backfill > 7 active+remapped+backfilling > 6 active+remapped+backfill_toofull > 3 active+degraded+remapped+backfilling > 2 active+remapped+wait_backfill+backfill_toofull > 2 active+recovery_wait+degraded+remapped > 1 active+degraded+remapped+wait_backfill > 1 active+recovery_wait+degraded > recovery io 7844 kB/s, 27 objects/s > client io 343 kB/s rd, 1 op/s > > > If you need more information, just say it. I need really help! > > Thank you so far for reading! > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com