I think I have figured out the issue.
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO PG_NUM NEW PG_NUM AUTOSCALEimages 28523G 3.0 68779G 1.2441 1000 warn
My images are 28523G with a replication level 3 and have a total of 68779G in Raw Capacity.
According to the documentation http://docs.ceph.com/docs/master/rados/operations/placement-groups/
"SIZE is the amount of data stored in the pool. TARGET SIZE, if present, is the amount of data the administrator has specified that they expect to eventually be stored in this pool. The system uses the larger of the two values for its calculation.
RATE is the multiplier for the pool that determines how much raw storage capacity is consumed. For example, a 3 replica pool will have a ratio of 3.0, while a k=4,m=2 erasure coded pool will have a ratio of 1.5.
RAW CAPACITY is the total amount of raw storage capacity on the OSDs that are responsible for storing this pool’s (and perhaps other pools’) data. RATIO is the ratio of that total capacity that this pool is consuming (i.e., ratio = size * rate / raw capacity)."
So ratio = "28523G * 3.0/68779G" = 1.2441x
So I'm oversubscribing by 1.2441x, thus the warning.
But ... looking at #ceph df
POOL ID STORED OBJECTS USED %USED MAX AVAIL
images 3 9.3 TiB 2.82M 28 TiB 57.94 6.7 TiB
I believe the 9.3TiB is the amount I have that is thinly provisioned vs a fully provisioned 28 TiB?
The raw capacity of the cluster is sitting at about 50% used.
Shouldn't the ratio be the amount STORED(from ceph df) * SIZE (from ceph osd pool autoscale-status) / Raw Capacity, since ceph uses thin provisioning in rbd?
Otherwise, this ratio will only work for people who don't thin provision which goes against what ceph is doing with rbd
http://docs.ceph.com/docs/master/rbd/
On Wed, May 1, 2019 at 11:44 AM Joe Ryner <jryner@xxxxxxxx> wrote:
I have found a little more information.When I turn off pg_autoscaler the warning goes away turn it back on and the warning comes back.I have ran the following:# ceph osd pool autoscale-statusPOOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO PG_NUM NEW PG_NUM AUTOSCALEimages 28523G 3.0 68779G 1.2441 1000 warnlocks 676.5M 3.0 68779G 0.0000 8 warnrbd 0 3.0 68779G 0.0000 8 warndata 0 3.0 68779G 0.0000 8 warnmetadata 3024k 3.0 68779G 0.0000 8 warn# ceph dfRAW STORAGE:CLASS SIZE AVAIL USED RAW USED %RAW USEDhdd 51 TiB 26 TiB 24 TiB 24 TiB 48.15ssd 17 TiB 8.5 TiB 8.1 TiB 8.1 TiB 48.69TOTAL 67 TiB 35 TiB 32 TiB 32 TiB 48.28POOLS:POOL ID STORED OBJECTS USED %USED MAX AVAILdata 0 0 B 0 0 B 0 6.7 TiBmetadata 1 6.3 KiB 21 3.0 MiB 0 6.7 TiBrbd 2 0 B 2 0 B 0 6.7 TiBimages 3 9.3 TiB 2.82M 28 TiB 57.94 6.7 TiBlocks 4 215 MiB 517 677 MiB 0 6.7 TiBIt looks to me like it thinks the images pool no right in the autoscale-status.Below is a osd crush tree# ceph osd crush treeID CLASS WEIGHT (compat) TYPE NAME-1 66.73337 root default-3 22.28214 22.28214 rack marack-8 7.27475 7.27475 host abacus19 hdd 1.81879 1.81879 osd.1920 hdd 1.81879 1.42563 osd.2021 hdd 1.81879 1.81879 osd.2150 hdd 1.81839 1.81839 osd.50-10 7.76500 6.67049 host gold7 hdd 0.86299 0.83659 osd.79 hdd 0.86299 0.78972 osd.910 hdd 0.86299 0.72031 osd.1014 hdd 0.86299 0.65315 osd.1415 hdd 0.86299 0.72586 osd.1522 hdd 0.86299 0.80528 osd.2223 hdd 0.86299 0.63741 osd.2324 hdd 0.86299 0.77718 osd.2425 hdd 0.86299 0.72499 osd.25-5 7.24239 7.24239 host hassium0 hdd 1.80800 1.52536 osd.01 hdd 1.80800 1.65421 osd.126 hdd 1.80800 1.65140 osd.2651 hdd 1.81839 1.81839 osd.51-2 21.30070 21.30070 rack marack2-12 7.76999 8.14474 host hamms27 ssd 0.86299 0.99367 osd.2728 ssd 0.86299 0.95961 osd.2829 ssd 0.86299 0.80768 osd.2930 ssd 0.86299 0.86893 osd.3031 ssd 0.86299 0.92583 osd.3132 ssd 0.86299 1.00227 osd.3233 ssd 0.86299 0.73099 osd.3334 ssd 0.86299 0.80766 osd.3435 ssd 0.86299 1.04811 osd.35-7 5.45636 5.45636 host parabola5 hdd 1.81879 1.81879 osd.512 hdd 1.81879 1.81879 osd.1213 hdd 1.81879 1.81879 osd.13-6 2.63997 3.08183 host radium2 hdd 0.87999 1.05594 osd.26 hdd 0.87999 1.10501 osd.611 hdd 0.87999 0.92088 osd.11-9 5.43439 5.43439 host splinter16 hdd 1.80800 1.80800 osd.1617 hdd 1.81839 1.81839 osd.1718 hdd 1.80800 1.80800 osd.18-11 23.15053 23.15053 rack marack3-13 8.63300 8.98921 host helm36 ssd 0.86299 0.71931 osd.3637 ssd 0.86299 0.92601 osd.3738 ssd 0.86299 0.79585 osd.3839 ssd 0.86299 1.08521 osd.3940 ssd 0.86299 0.89500 osd.4041 ssd 0.86299 0.92351 osd.4142 ssd 0.86299 0.89690 osd.4243 ssd 0.86299 0.92480 osd.4344 ssd 0.86299 0.84467 osd.4445 ssd 0.86299 0.97795 osd.45-40 7.27515 7.89609 host samarium46 hdd 1.81879 1.90242 osd.4647 hdd 1.81879 1.86723 osd.4748 hdd 1.81879 1.93404 osd.4849 hdd 1.81879 2.19240 osd.49-4 7.24239 7.24239 host scandium3 hdd 1.80800 1.76680 osd.34 hdd 1.80800 1.80800 osd.48 hdd 1.80800 1.80800 osd.852 hdd 1.81839 1.81839 osd.52Any ideas?On Wed, May 1, 2019 at 9:32 AM Joe Ryner <jryner@xxxxxxxx> wrote:Hi,I have an old ceph cluster and have upgraded recently from Luminous to Nautilus. After converting to Nautilus I decided it was time to convert to bluestore.Before I converted the cluster was healthy but after I have a HEALTH_WARN#ceph health detailHEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratioPOOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_bytesPools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit available storage by 1.244x due to target_size_bytes 0 on pools []POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_ratioPools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit available storage by 1.244x due to target_size_ratio 0.000 on pools []I started with a target_size ratio of .85 on the images pool and reduced it to 0 to hopefully get the warning to go away. The cluster seems to be running fine, I just can't figure out what the problem is and how to make the message go away. I restarted the monitors this morning in hopes to fix it. Anyone have any ideas?Thanks in advance--Joe RynerAssociate DirectorCenter for the Application of Information Technologies (CAIT) - http://www.cait.orgWestern Illinois University - http://www.wiu.eduP: (309) 298-1804F: (309) 298-2806--Joe RynerAssociate DirectorCenter for the Application of Information Technologies (CAIT) - http://www.cait.orgWestern Illinois University - http://www.wiu.eduP: (309) 298-1804F: (309) 298-2806
Joe Ryner
Associate Director
Center for the Application of Information Technologies (CAIT) - http://www.cait.org
Western Illinois University - http://www.wiu.edu
P: (309) 298-1804
F: (309) 298-2806
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com