Hi Brian, On 14 February 2017 at 19:33, Brian Andrus <brian.andrus at dreamhost.com> wrote: > > > On Tue, Feb 14, 2017 at 5:27 AM, Tyanko Aleksiev <tyanko.alexiev at gmail.com > > wrote: > >> Hi Cephers, >> >> At University of Zurich we are using Ceph as a storage back-end for our >> OpenStack installation. Since we recently reached 70% of occupancy >> (mostly caused by the cinder pool served by 16384PGs) we are in the >> phase of extending the cluster with additional storage nodes of the same >> type (except for a slight more powerful CPU). >> >> We decided to opt for a gradual OSD deployment: we created a temporary >> "root" >> bucket called "fresh-install" containing the newly installed nodes and >> then we >> moved OSDs from this bucket to the current production root via: >> >> ceph osd crush set osd.{id} {weight} host={hostname} >> root={production_root} >> >> Everything seemed nicely planned but when we started adding a few new >> OSDs to the cluster, and thus triggering a rebalancing, one of the OSDs, >> already at 84% disk use, passed the 85% threshold. This in turn >> triggered the "near full osd(s)" warning and more than 20PGs previously >> in "wait_backfill" state were marked as: "wait_backfill+backfill_tooful >> l". >> Since the OSD kept growing until, reached 90% disk use, we decided to >> reduce >> its relative weight from 1 to 0.95. >> The last action recalculated the crushmap and remapped a few PGs but did >> not appear to move any data off the almost full OSD. Only when, by steps >> of 0.05, we reached 0.50 of relative weight data was moved and some >> "backfill_toofull" requests were released. However, he had do go down >> almost to 0.10% of relative weight in order to trigger some additional >> data movement and have the backfilling process finally finished. >> >> We are now adding new OSDs but the problem is constantly triggered since >> we have multiple OSDs > 83% that starts growing during the rebalance. >> >> My questions are: >> >> - Is there something wrong in our process of adding new OSDs (some >> additional >> details below)? >> >> > It could work but - also could be more disruptive than need be. We have a > similar situation/configuration and what we do is start OSDs with ` osd > crush initial weight = 0` as well as "crush_osd_location" set properly. > This will weight the OSDs at 0 weight and let us bring them in in a > controlled fashion. We bring them in to 1 (no disruption), then crush > weight in gradually. > We are currently trying out this type of gradual insertion. Thanks! > > >> - We also noticed that the problem has the tendency to cluster around the newly >> added OSDs, so could those two things be correlated? >> >> I'm not sure which problem you are referring to - this OSDs filling? > Possibly due to temporary files or some other mechanism I'm not familiar > with adding a little extra data on top. > >> - Why reweighting does not trigger instant data moving? What's the logic >> behind remapped PGs? Is there some sort of flat queue of tasks or does >> it have some priorities defined? >> >> > It should, perhaps you aren't choosing large enough increments or perhaps > you have some settings set. > Indeed, with sufficiently large increments it triggers some instant pg rebalancing. > > >> - Did somebody experience this situation and eventually how was it solved/bypassed? >> >> > FWIW, we also run a rebalance cronjob every hour with the following: > > `ceph osd reweight-by-utilization 103 .010 10` > Already running that but on a daily basis. > > it was detailed in another recent thread on [ceph-users] > > >> Cluster details are as follows: >> >> - version: 0.94.9 >> - 5 monitors, >> - 40 storage hosts with an overall of 24 X 4TB disks: 1 OSD/disk (960 OSDs in total), >> - osd pool default size = 3, >> - journaling is on SSDs. >> >> We have "hosts" failure domain. Relevant crushmap details: >> >> # rules >> rule sas { >> ruleset 1 >> type replicated >> min_size 1 >> max_size 10 >> step take sas >> step chooseleaf firstn 0 type host >> step emit >> } >> >> root sas { >> id -41 # do not change unnecessarily >> # weight 3283.279 >> alg straw >> hash 0 # rjenkins1 >> item osd-l2-16 weight 87.360 >> item osd-l4-06 weight 87.360 >> ... >> item osd-k7-41 weight 14.560 >> item osd-l4-36 weight 14.560 >> item osd-k5-36 weight 14.560 >> } >> >> host osd-k7-21 { >> id -46 # do not change unnecessarily >> # weight 87.360 >> alg straw >> hash 0 # rjenkins1 >> item osd.281 weight 3.640 >> item osd.282 weight 3.640 >> item osd.285 weight 3.640 >> ... >> } >> >> host osd-k7-41 { >> id -50 # do not change unnecessarily >> # weight 14.560 >> alg straw >> hash 0 # rjenkins1 >> item osd.900 weight 3.640 >> item osd.901 weight 3.640 >> item osd.902 weight 3.640 >> item osd.903 weight 3.640 >> } >> >> >> As mentioned before we created a temporary bucket called "fresh-install" >> containing the newly installed nodes (i.e.): >> >> root fresh-install { >> id -34 # do not change unnecessarily >> # weight 218.400 >> alg straw >> hash 0 # rjenkins1 >> item osd-k5-36-fresh weight 72.800 >> item osd-k7-41-fresh weight 72.800 >> item osd-l4-36-fresh weight 72.800 >> } >> >> Then, by steps of 6 OSDs (2 OSDs from each new host), we move OSDs from >> the "fresh-install" to the "sas" bucket. >> >> > I would highly recommend a simple script to weight in gradually as > described above. Much more controllable and you can twiddle the knobs to > your heart's desire. > >> >> Thank you in advance for all the suggestions. >> >> Cheers, >> Tyanko >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > Hope that helps. > Thanks for the suggestions. Cheers, Tyanko > > -- > Brian Andrus | Cloud Systems Engineer | DreamHost > brian.andrus at DreamHost.com | www.dreamhost.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20170220/8ee7629e/attachment.htm>