Hi together, can somebody confirm whether should I put this in a ticket, or whether this is wanted (but very unexpected) behaviour? We have some pools which gain a factor of three by compression: POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR rbd 2 1.2 TiB 472.44k 1.8 TiB 35.24 1.1 TiB N/A N/A 472.44k 717 GiB 2.1 TiB so as of now, this always leads to a health warning via pg-autoscaler as soon as the cluster is 33 % filled, since it thinks the subtree is overcommitted: POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE default.rgw.buckets.data 61358M 3.0 5952G 0.0302 0.0700 1.0 32 on rbd 1856G 3.0 5952G 0.9359 0.9200 1.0 256 on Cheers, Oliver Am 12.09.19 um 23:34 schrieb Oliver Freyermuth:
Dear Cephalopodians, I can confirm the same problem described by Joe Ryner in 14.2.2. I'm also getting (in a small test setup): ----------------------------------------------------- # ceph health detail HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_bytes Pools ['rbd', '.rgw.root', 'default.rgw.control', 'default.rgw.meta', 'default.rgw.log', 'default.rgw.buckets.index', 'default.rgw.buckets.data'] overcommit available storage by 1.068x due to target_size_bytes 0 on pools [] POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_ratio Pools ['rbd', '.rgw.root', 'default.rgw.control', 'default.rgw.meta', 'default.rgw.log', 'default.rgw.buckets.index', 'default.rgw.buckets.data'] overcommit available storage by 1.068x due to target_size_ratio 0.000 on pools [] ----------------------------------------------------- However, there's not much actual data STORED: ----------------------------------------------------- # ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 4.0 TiB 2.6 TiB 1.4 TiB 1.4 TiB 35.94 TOTAL 4.0 TiB 2.6 TiB 1.4 TiB 1.4 TiB 35.94POOLS:POOL ID STORED OBJECTS USED %USED MAX AVAIL rbd 2 676 GiB 266.40k 707 GiB 23.42 771 GiB .rgw.root 9 1.2 KiB 4 768 KiB 0 771 GiB default.rgw.control 10 0 B 8 0 B 0 771 GiB default.rgw.meta 11 1.2 KiB 8 1.3 MiB 0 771 GiB default.rgw.log 12 0 B 175 0 B 0 771 GiB default.rgw.buckets.index 13 0 B 1 0 B 0 771 GiB default.rgw.buckets.data 14 249 GiB 99.62k 753 GiB 24.57 771 GiB ----------------------------------------------------- The main culprit here seems to be the default.rgw.buckets.data pool, but also the rbd pool contains thin images. As in the case of Joe, the autoscaler seems to look at the "USED" space, not at the "STORED" bytes: ----------------------------------------------------- POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE default.rgw.meta 1344k 3.0 4092G 0.0000 1.0 8 on default.rgw.buckets.index 0 3.0 4092G 0.0000 1.0 8 on default.rgw.control 0 3.0 4092G 0.0000 1.0 8 on default.rgw.buckets.data 788.6G 3.0 4092G 0.5782 1.0 128 on .rgw.root 768.0k 3.0 4092G 0.0000 1.0 8 on rbd 710.8G 3.0 4092G 0.5212 1.0 64 on default.rgw.log 0 3.0 4092G 0.0000 1.0 8 on ----------------------------------------------------- This does seem like a bug to me. The warning actually fires on a cluster with 35 % raw usage, and things are mostly balanced. Is there already a tracker entry on this? Cheers, Oliver On 2019-05-01 22:01, Joe Ryner wrote:I think I have figured out the issue. POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO PG_NUM NEW PG_NUM AUTOSCALE images 28523G 3.0 68779G 1.2441 1000 warn My images are 28523G with a replication level 3 and have a total of 68779G in Raw Capacity. According to the documentation http://docs.ceph.com/docs/master/rados/operations/placement-groups/ "*SIZE* is the amount of data stored in the pool. *TARGET SIZE*, if present, is the amount of data the administrator has specified that they expect to eventually be stored in this pool. The system uses the larger of the two values for its calculation. *RATE* is the multiplier for the pool that determines how much raw storage capacity is consumed. For example, a 3 replica pool will have a ratio of 3.0, while a k=4,m=2 erasure coded pool will have a ratio of 1.5. *RAW CAPACITY* is the total amount of raw storage capacity on the OSDs that are responsible for storing this pool’s (and perhaps other pools’) data. *RATIO* is the ratio of that total capacity that this pool is consuming (i.e., ratio = size * rate / raw capacity)." So ratio = "28523G * 3.0/68779G" = 1.2441x So I'm oversubscribing by 1.2441x, thus the warning. But ... looking at #ceph df POOL ID STORED OBJECTS USED %USED MAX AVAIL images 3 9.3 TiB 2.82M 28 TiB 57.94 6.7 TiB I believe the 9.3TiB is the amount I have that is thinly provisioned vs a fully provisioned 28 TiB? The raw capacity of the cluster is sitting at about 50% used. Shouldn't the ratio be the amount STORED(from ceph df) * SIZE (from ceph osd pool autoscale-status) / Raw Capacity, since ceph uses thin provisioning in rbd? Otherwise, this ratio will only work for people who don't thin provision which goes against what ceph is doing with rbd http://docs.ceph.com/docs/master/rbd/ On Wed, May 1, 2019 at 11:44 AM Joe Ryner <jryner@xxxxxxxx <mailto:jryner@xxxxxxxx>> wrote: I have found a little more information. When I turn off pg_autoscaler the warning goes away turn it back on and the warning comes back. I have ran the following: # ceph osd pool autoscale-status POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO PG_NUM NEW PG_NUM AUTOSCALE images 28523G 3.0 68779G 1.2441 1000 warn locks 676.5M 3.0 68779G 0.0000 8 warn rbd 0 3.0 68779G 0.0000 8 warn data 0 3.0 68779G 0.0000 8 warn metadata 3024k 3.0 68779G 0.0000 8 warn # ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 51 TiB 26 TiB 24 TiB 24 TiB 48.15 ssd 17 TiB 8.5 TiB 8.1 TiB 8.1 TiB 48.69 TOTAL 67 TiB 35 TiB 32 TiB 32 TiB 48.28POOLS:POOL ID STORED OBJECTS USED %USED MAX AVAIL data 0 0 B 0 0 B 0 6.7 TiB metadata 1 6.3 KiB 21 3.0 MiB 0 6.7 TiB rbd 2 0 B 2 0 B 0 6.7 TiB images 3 9.3 TiB 2.82M 28 TiB 57.94 6.7 TiB locks 4 215 MiB 517 677 MiB 0 6.7 TiB It looks to me like it thinks the images pool no right in the autoscale-status. Below is a osd crush tree # ceph osd crush tree ID CLASS WEIGHT (compat) TYPE NAME -1 66.73337 root default -3 22.28214 22.28214 rack marack -8 7.27475 7.27475 host abacus 19 hdd 1.81879 1.81879 osd.19 20 hdd 1.81879 1.42563 osd.20 21 hdd 1.81879 1.81879 osd.21 50 hdd 1.81839 1.81839 osd.50 -10 7.76500 6.67049 host gold 7 hdd 0.86299 0.83659 osd.7 9 hdd 0.86299 0.78972 osd.9 10 hdd 0.86299 0.72031 osd.10 14 hdd 0.86299 0.65315 osd.14 15 hdd 0.86299 0.72586 osd.15 22 hdd 0.86299 0.80528 osd.22 23 hdd 0.86299 0.63741 osd.23 24 hdd 0.86299 0.77718 osd.24 25 hdd 0.86299 0.72499 osd.25 -5 7.24239 7.24239 host hassium 0 hdd 1.80800 1.52536 osd.0 1 hdd 1.80800 1.65421 osd.1 26 hdd 1.80800 1.65140 osd.26 51 hdd 1.81839 1.81839 osd.51 -2 21.30070 21.30070 rack marack2 -12 7.76999 8.14474 host hamms 27 ssd 0.86299 0.99367 osd.27 28 ssd 0.86299 0.95961 osd.28 29 ssd 0.86299 0.80768 osd.29 30 ssd 0.86299 0.86893 osd.30 31 ssd 0.86299 0.92583 osd.31 32 ssd 0.86299 1.00227 osd.32 33 ssd 0.86299 0.73099 osd.33 34 ssd 0.86299 0.80766 osd.34 35 ssd 0.86299 1.04811 osd.35 -7 5.45636 5.45636 host parabola 5 hdd 1.81879 1.81879 osd.5 12 hdd 1.81879 1.81879 osd.12 13 hdd 1.81879 1.81879 osd.13 -6 2.63997 3.08183 host radium 2 hdd 0.87999 1.05594 osd.2 6 hdd 0.87999 1.10501 osd.6 11 hdd 0.87999 0.92088 osd.11 -9 5.43439 5.43439 host splinter 16 hdd 1.80800 1.80800 osd.16 17 hdd 1.81839 1.81839 osd.17 18 hdd 1.80800 1.80800 osd.18 -11 23.15053 23.15053 rack marack3 -13 8.63300 8.98921 host helm 36 ssd 0.86299 0.71931 osd.36 37 ssd 0.86299 0.92601 osd.37 38 ssd 0.86299 0.79585 osd.38 39 ssd 0.86299 1.08521 osd.39 40 ssd 0.86299 0.89500 osd.40 41 ssd 0.86299 0.92351 osd.41 42 ssd 0.86299 0.89690 osd.42 43 ssd 0.86299 0.92480 osd.43 44 ssd 0.86299 0.84467 osd.44 45 ssd 0.86299 0.97795 osd.45 -40 7.27515 7.89609 host samarium 46 hdd 1.81879 1.90242 osd.46 47 hdd 1.81879 1.86723 osd.47 48 hdd 1.81879 1.93404 osd.48 49 hdd 1.81879 2.19240 osd.49 -4 7.24239 7.24239 host scandium 3 hdd 1.80800 1.76680 osd.3 4 hdd 1.80800 1.80800 osd.4 8 hdd 1.80800 1.80800 osd.8 52 hdd 1.81839 1.81839 osd.52 Any ideas? On Wed, May 1, 2019 at 9:32 AM Joe Ryner <jryner@xxxxxxxx <mailto:jryner@xxxxxxxx>> wrote: Hi, I have an old ceph cluster and have upgraded recently from Luminous to Nautilus. After converting to Nautilus I decided it was time to convert to bluestore. Before I converted the cluster was healthy but after I have a HEALTH_WARN #ceph health detail HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_bytes Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit available storage by 1.244x due to target_size_bytes 0 on pools [] POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_ratio Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit available storage by 1.244x due to target_size_ratio 0.000 on pools [] I started with a target_size ratio of .85 on the images pool and reduced it to 0 to hopefully get the warning to go away. The cluster seems to be running fine, I just can't figure out what the problem is and how to make the message go away. I restarted the monitors this morning in hopes to fix it. Anyone have any ideas? Thanks in advance -- Joe Ryner Associate Director Center for the Application of Information Technologies (CAIT) - http://www.cait.org Western Illinois University - http://www.wiu.edu P: (309) 298-1804 F: (309) 298-2806 -- Joe Ryner Associate Director Center for the Application of Information Technologies (CAIT) - http://www.cait.org Western Illinois University - http://www.wiu.edu P: (309) 298-1804 F: (309) 298-2806 -- Joe Ryner Associate Director Center for the Application of Information Technologies (CAIT) - http://www.cait.org Western Illinois University - http://www.wiu.edu P: (309) 298-1804 F: (309) 298-2806 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.comDieser Nachrichteninhalt wird auf Anfrage komplett heruntergeladen.
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com