Re: Full cluster, new OSDS not being used

Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> · Tue, 23 Aug 2022 14:13:30 -0400



https://docs.ceph.com/en/pacific/rados/operations/upmap/

Respectfully,

*Wes Dillingham*
wes@xxxxxxxxxxxxxxxxx
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Aug 23, 2022 at 1:45 PM Wyll Ingersoll <
wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:

> Thank you - we have increased backfill settings, but can you elaborate on
> "injecting upmaps" ?
> ------------------------------
> *From:* Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx>
> *Sent:* Tuesday, August 23, 2022 1:44 PM
> *To:* Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>
> *Cc:* ceph-users@xxxxxxx <ceph-users@xxxxxxx>
> *Subject:* Re:  Full cluster, new OSDS not being used
>
> In that case I would say your options are to make use of injecting upmaps
> to move data off the full osds or to increase the backfill throttle
> settings to make things move faster.
>
> Respectfully,
>
> *Wes Dillingham*
> wes@xxxxxxxxxxxxxxxxx
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, Aug 23, 2022 at 1:28 PM Wyll Ingersoll <
> wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>
> Unfortunately, I cannot. The system in question is in a secure location
> and I don't have direct access to it.  The person on site runs the commands
> I send them and the osd tree is correct as far as we can tell. The new
> hosts and osds are in the right place in the tree and have proper weights.
> One small difference is that the new osds have a class ("hdd"), whereas
> MOST of the pre-existing osds do not have a class designation, this is a
> cluster that has grown and been upgraded over several releases of ceph.
> Currently it is running pacific 16.2.9.  However, removing the class
> designation on one of the new osds did not make any difference so I dont
> think that is the issue.
>
> The cluster is slowly recovering, but our new OSDs are very lightly used
> at this point, only a few PGs have been assigned to them, though more than
> zero and the number does appear to be slowly (very slowly) growing so
> recovery is happening but very very slowly.
>
>
>
>
> ------------------------------
> *From:* Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx>
> *Sent:* Tuesday, August 23, 2022 1:18 PM
> *To:* Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>
> *Cc:* ceph-users@xxxxxxx <ceph-users@xxxxxxx>
> *Subject:* Re:  Full cluster, new OSDS not being used
>
> Can you please send the output of "ceph osd tree"
>
> Respectfully,
>
> *Wes Dillingham*
> wes@xxxxxxxxxxxxxxxxx
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, Aug 23, 2022 at 10:53 AM Wyll Ingersoll <
> wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>
>
> We have a large cluster with a many osds that are at their nearfull or
> full ratio limit and are thus having problems rebalancing.
> We added 2 more storage nodes, each with 20 additional drives  to give the
> cluster room to rebalance.  However, for the past few days, the new OSDs
> are NOT being used and the cluster remains stuck and is not improving.
>
> The crush map is correct, the new hosts and osds are at the correct
> location, but dont seem to be getting used.
>
> Any idea how we can force the full or backfillfull OSDs to start unloading
> their pgs to the newly added ones?
>
> thanks!
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx