Re: OSDs growing beyond full ratio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Wyll,

The only way I could get my OSDs to start dropping their utilization because of a similar "unable to access the fs" problem was to run "ceph osd crush reweight <osd> 0" on the full OSDs then wait while they start to empty and get below the full ratio.  Not this is different from ceph osd reweight.... (missing the word crush).  I know this goes against the documented best practices and I'm just relaying what worked for me recently.  I'm running 14.2.22 and I think you're Pacific which is 2 major versions newer.

In case it's important: We also had HDD with SSD for DB/WAL.


@Ceph gurus: Is a file in ceph assigned to a specific PG?  In my case it seems like a file that's close to the size of a single OSD gets moved from one OSD to the next filling it up and domino-ing around the cluster filling up OSDs.

Sincerely

-Dave


On 2022-08-30 8:04 a.m., Wyll Ingersoll wrote:
[△EXTERNAL]




OSDs are bluestore on HDD with SSD for DB/WAL.  We already tuned the sleep_hdd to 0 and cranked up the max_backfills and recovery parameters to much higher values.


------------------------------------------------------------------------
*From:* Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
*Sent:* Tuesday, August 30, 2022 9:46 AM
*To:* Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>
*Cc:* Dave Schulz <dschulz@xxxxxxxxxxx>; ceph-users@xxxxxxx <ceph-users@xxxxxxx>
*Subject:* Re:  Re: OSDs growing beyond full ratio
Hey Wyll,

I haven't been following this thread very closely so my apologies if
this has already been covered: Are the OSDs on HDDs or SSDs (or
hybrid)? If HDDs, you may want to look at decreasing
osd_recovery_sleep_hdd and increasing osd_max_backfills. YMMV, but
I've seen osd_recovery_sleep_hdd=0.01 and osd_max_backfills=6 work OK
on Bluestore HDDs. This would help speed up the data movements.

If it's a hybrid setup, I'm sure you could apply similar tweaks. Sleep
is already 0 for SSDs but you may be able to increase max_backfills
for some gains.

Josh

On Tue, Aug 30, 2022 at 7:31 AM Wyll Ingersoll
<wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>
>
> Yes, this cluster has both - a large cephfs FS (60TB) that is replicated (2-copy) and a really large RGW data pool that is EC (12+4).  We cannot currently delete any data from either of them because commands to access them are not responsive.  The cephfs will not mount and radosgw-admin just hangs.
>
> We have several OSDs that are >99% full and keep approaching 100, even after reweighting them to 0. There is no client activity in this cluster at this point (its dead), but lots of rebalance and repairing going on) so data is moving around.
>
> We are currently trying to use upmap commands to relocate PGs in to attempt to balance things better and get it moving again, but progress is glacially slow.
>
> ________________________________
> From: Dave Schulz <dschulz@xxxxxxxxxxx>
> Sent: Monday, August 29, 2022 10:42 PM
> To: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>; ceph-users@xxxxxxx <ceph-users@xxxxxxx>
> Subject: Re:  Re: OSDs growing beyond full ratio
>
> Hi Wyll,
>
> Any chance you're using CephFS and have some really large files in the
> CephFS filesystem?  Erasure coding? I recently encountered a similar
> problem and as soon as the end-user deleted the really large files our
> problem became much more managable.
>
> I had issues reweighting OSDs too and in the end I changed the crush
> weights and had to chase them around every couple of days reweighting
> the OSDs >70% to zero and then setting them back to 12 when they were
> mostly empty (12TB spinning rust buckets).  Note that I'm really not
> recommending this course of action it's just the only option that seemed
> to have any effect.
>
> -Dave
>
> On 2022-08-29 3:00 p.m., Wyll Ingersoll wrote:
> > [△EXTERNAL]
> >
> >
> >
> > Can anyone explain why OSDs (ceph pacific, bluestore osds) continue to grow well after they have exceeded the "full" level (95%) and is there any way to stop this?
> >
> > "The full_ratio is 0.95 but we have several osds that continue to grow and are approaching 100% utilization.  They are reweighted to almost 0, but yet continue to grow. > > Why is this happening?  I thought the cluster would stop writing to the osd when it was at above the full ratio."
> >
> > thanks...
> >
> > ________________________________
> > From: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>
> > Sent: Monday, August 29, 2022 9:24 AM
> > To: Jarett <starkruzr@xxxxxxxxx>; ceph-users@xxxxxxx <ceph-users@xxxxxxx>
> > Subject:  Re: OSDs growing beyond full ratio
> >
> >
> > I would think so, but it isn't happening nearly fast enough.
> >
> > It's literally been over 10 days with 40 new drives across 2 new servers and they barely have any PGs yet. A few, but not nearly enough to help with the imbalance.
> > ________________________________
> > From: Jarett <starkruzr@xxxxxxxxx>
> > Sent: Sunday, August 28, 2022 8:19 PM
> > To: Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx>; ceph-users@xxxxxxx <ceph-users@xxxxxxx>
> > Subject: RE:  OSDs growing beyond full ratio
> >
> >
> > Isn’t rebalancing onto the empty OSDs default behavior?
> >
> >
> >
> > From: Wyll Ingersoll<mailto:wyllys.ingersoll@xxxxxxxxxxxxxx <mailto:wyllys.ingersoll@xxxxxxxxxxxxxx>>
> > Sent: Sunday, August 28, 2022 10:31 AM
> > To: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>
> > Subject:  OSDs growing beyond full ratio
> >
> >
> >
> > We have a pacific cluster that is overly filled and is having major trouble recovering.  We are desperate for help in improving recovery speed.  We have modified all of the various recovery throttling parameters.
> >
> >
> >
> > The full_ratio is 0.95 but we have several osds that continue to grow and are approaching 100% utilization.  They are reweighted to almost 0, but yet continue to grow.
> >
> > Why is this happening?  I thought the cluster would stop writing to the osd when it was at above the full ratio.
> >
> >
> >
> >
> >
> > We have added additional capacity to the cluster but the new OSDs are being used very very slowly.  The primary pool in the cluster is the RGW data pool which is a 12+4 EC pool using "host" placement rules across 18 hosts, 2 new hosts with 20x10TB osds each were recently added but they are only very very slowly being filled up. I don't see how to force recovery on that particular pool.   From what I understand, we cannot modify the EC parameters without destroying the pool and we cannot offload that pool to any others because there is no other place to store the amount of data.
> >
> >
> >
> >
> >
> > We have been running "ceph osd reweight-by-utilization"  periodically and it works for a while (a few hours) but then recovery and backfill IO numbers drop to negligible values.
> >
> >
> >
> > The balancer module will not run because the current misplaced % is about 97%.
> >
> >
> >
> > Would it be more effective to use the osmaptool and generate a bunch of upmap commands to manually move data around or keep trying to get reweight-by-utlilization to work?
> >
> >
> >
> > Any suggestions (other than deleting data which we cannot do at this point, the pools are not accessible) or adding more storage (we already did and it is not being utilized very heavily yet for some reason).
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> >
> > ceph-users mailing list -- ceph-users@xxxxxxx
> >
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux