Re: How to force PG merging in one step?

Eugen Block <eblock@xxxxxx> · Thu, 03 Nov 2022 10:41:17 +0000

Hi Frank,

Is this not checked per OSD? This would be really bad, because if it  
just uses the average (currently 143.3) this warning will never be  
triggered in critical situations.

I believe you're right, I can only remember having warnings about the  
average pg count per OSD, not the absolute value. I'm not aware if  
this is being worked on, maybe a tracker issue would be helpful here.

Regards,
Eugen

Zitat von Frank Schilder <frans@xxxxxx>:

Hi Eugen,

the PG merge finished and I still observe that no PG warning shows  
up. We have

  mgr                   advanced  mon_max_pg_per_osd                  
             300

and I have an OSD with 306 PGs. Still, no warning:

# ceph health detail
HEALTH_OK

Is this not checked per OSD? This would be really bad, because if it  
just uses the average (currently 143.3) this warning will never be  
triggered in critical situations. Our average PG count is dominated  
by a huge HDD pool with ca. 120 PGs/OSD. We have a number of smaller  
SSD pools where we go closer to the limit. The critical pool has 24  
OSDs, and I would have to create thousands of PGs on these OSDs  
before the average crosses the threshold. In other words, if its the  
average PG count that is used, this warning is almost always too  
late, because its the small pools where too many PGs end up on OSDs,  
but these don't influence the average much.

I was always wondering how users ended up with more than 1000 PGs  
per OSD by accident during recovery. It now makes more sense. If  
there is no per-OSD warning, this can easily happen.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 12 October 2022 17:11:02
To: Eugen Block
Cc: ceph-users@xxxxxxx
Subject:  Re: How to force PG merging in one step?

Hi Eugen.

During recovery there's another factor involved
(osd_max_pg_per_osd_hard_ratio), the default is 3. I had to deal with
that a few months back when I got inactive PGs due to many chunks and
"only" a factor of 3. In that specific cluster I increased it to 5 and
didn't encounter inactive PGs anymore.

Yes, I looked at this as well and I remember cases where people got  
stuck with temporary PG numbers being too high. This is precisely  
why I wanted to see this warning. If its off during recovery, the  
only way to notice that something is going wrong is when you hit the  
hard limit. But then its too late.

I actually wanted to see this during recovery to have an early  
warning sign. I purposefully did not increase pg_num_max to 500 to  
make sure that warning shows up. I personally consider it really bad  
behaviour if recovery/rebalancing disables this warning. Recovery is  
the operation where exceeding a PG limit limit without knowing will  
hurt most.

Thanks for the heads up. Probably need to watch my * a bit more with  
certain things.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx