Re: extending ceph cluster with osds close to near full ratio (85%)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Tue, Feb 14, 2017 at 5:27 AM, Tyanko Aleksiev <tyanko.alexiev@xxxxxxxxx> wrote:
Hi Cephers,
At University of Zurich we are using Ceph as a storage back-end for our
OpenStack installation. Since we recently reached 70% of occupancy
(mostly caused by the cinder pool served by 16384PGs) we are in the
phase of extending the cluster with additional storage nodes of the same
type (except for a slight more powerful CPU).

We decided to opt for a gradual OSD deployment: we created a temporary "root"
bucket called "fresh-install" containing the newly installed nodes and then we
moved OSDs from this bucket to the current production root via:

ceph osd crush set osd.{id} {weight} host={hostname} root={production_root}

Everything seemed nicely planned but when we started adding a few new
OSDs to the cluster, and thus triggering a rebalancing, one of the OSDs,
already at 84% disk use, passed the 85% threshold. This in turn
triggered the "near full osd(s)" warning and more than 20PGs previously
in "wait_backfill" state were marked as: "wait_backfill+backfill_toofull".
Since the OSD kept growing until, reached 90% disk use, we decided to reduce
its relative weight from 1 to 0.95.
The last action recalculated the crushmap and remapped a few PGs but did
not appear to move any data off the almost full OSD. Only when, by steps
of 0.05, we reached 0.50 of relative weight data was moved and some
"backfill_toofull" requests were released. However, he had do go down
almost to 0.10% of relative weight in order to trigger some additional
data movement and have the backfilling process finally finished.

We are now adding new OSDs but the problem is constantly triggered since
we have multiple OSDs > 83% that starts growing during the rebalance.

My questions are:

- Is there something wrong in our process of adding new OSDs (some additional
details below)?

It could work but - also could be more disruptive than need be. We have a similar situation/configuration and what we do is start OSDs with ` osd crush initial weight = 0` as well as "crush_osd_location" set properly. This will weight the OSDs at 0 weight and let us bring them in in a controlled fashion. We bring them in to 1 (no disruption), then crush weight in gradually.
 
- We also noticed that the problem has the tendency to cluster around the newly
added OSDs, so could those two things be correlated?
I'm not sure which problem you are referring to - this OSDs filling? Possibly due to temporary files or some other mechanism I'm not familiar with adding a little extra data on top. 
- Why reweighting does not trigger instant data moving? What's the logic
behind remapped PGs? Is there some sort of flat queue of tasks or does
it have some priorities defined?

It should, perhaps you aren't choosing large enough increments or perhaps you have some settings set.
 
- Did somebody experience this situation and eventually how was it solved/bypassed?

FWIW, we also run a rebalance cronjob every hour with the following:

`ceph osd reweight-by-utilization 103 .010 10`

it was detailed in another recent thread on
 
Cluster details are as follows:

- version: 0.94.9
- 5 monitors,
- 40 storage hosts with an overall of 24 X 4TB disks: 1 OSD/disk (960 OSDs in total),
- osd pool default size = 3,
- journaling is on SSDs.

We have "hosts" failure domain. Relevant crushmap details:

# rules
rule sas {
ruleset 1
type replicated
min_size 1
max_size 10
step take sas
step chooseleaf firstn 0 type host
step emit
}

root sas {
id -41 # do not change unnecessarily
# weight 3283.279
alg straw
hash 0 # rjenkins1
item osd-l2-16 weight 87.360
item osd-l4-06 weight 87.360
...
item osd-k7-41 weight 14.560
item osd-l4-36 weight 14.560
item osd-k5-36 weight 14.560
}

host osd-k7-21 {
id -46 # do not change unnecessarily
# weight 87.360
alg straw
hash 0 # rjenkins1
item osd.281 weight 3.640
item osd.282 weight 3.640
item osd.285 weight 3.640
...
}

host osd-k7-41 {
id -50 # do not change unnecessarily
# weight 14.560
alg straw
hash 0 # rjenkins1
item osd.900 weight 3.640
item osd.901 weight 3.640
item osd.902 weight 3.640
item osd.903 weight 3.640
}


As mentioned before we created a temporary bucket called "fresh-install"
containing the newly installed nodes (i.e.):

root fresh-install {
id -34 # do not change unnecessarily
# weight 218.400
alg straw
hash 0 # rjenkins1
item osd-k5-36-fresh weight 72.800
item osd-k7-41-fresh weight 72.800
item osd-l4-36-fresh weight 72.800
}

Then, by steps of 6 OSDs (2 OSDs from each new host), we move OSDs from
the "fresh-install" to the "sas" bucket.

I would highly recommend a simple script to weight in gradually as described above. Much more controllable and you can twiddle the knobs to your heart's desire. 

Thank you in advance for all the suggestions.

Cheers,
Tyanko

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hope that helps.

--
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux