Re: POOL_TARGET_SIZE_BYTES_OVERCOMMITTED

Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> · Wed, 25 Sep 2019 12:51:00 +0200

Hi together,

can somebody confirm whether should I put this in a ticket, or whether this is wanted (but very unexpected) behaviour?
We have some pools which gain a factor of three by compression:
    POOL                          ID     STORED      OBJECTS     USED        %USED     MAX AVAIL     QUOTA OBJECTS     QUOTA BYTES     DIRTY       USED COMPR     UNDER COMPR
    rbd                            2     1.2 TiB     472.44k     1.8 TiB     35.24       1.1 TiB     N/A               N/A             472.44k        717 GiB         2.1 TiB
so as of now, this always leads to a health warning via pg-autoscaler as soon as the cluster is 33 % filled, since it thinks the subtree is overcommitted:
 POOL                         SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE
 default.rgw.buckets.data   61358M                3.0         5952G  0.0302        0.0700   1.0      32              on
 rbd                         1856G                3.0         5952G  0.9359        0.9200   1.0     256              on

Cheers,
	Oliver

Am 12.09.19 um 23:34 schrieb Oliver Freyermuth:
Dear Cephalopodians,

I can confirm the same problem described by Joe Ryner in 14.2.2. I'm also getting (in a small test setup):
-----------------------------------------------------
# ceph health detail
HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio
POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_bytes
     Pools ['rbd', '.rgw.root', 'default.rgw.control', 'default.rgw.meta', 'default.rgw.log', 'default.rgw.buckets.index', 'default.rgw.buckets.data'] overcommit available storage by 1.068x due to target_size_bytes    0  on pools []
POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_ratio
     Pools ['rbd', '.rgw.root', 'default.rgw.control', 'default.rgw.meta', 'default.rgw.log', 'default.rgw.buckets.index', 'default.rgw.buckets.data'] overcommit available storage by 1.068x due to target_size_ratio 0.000 on pools []
-----------------------------------------------------

However, there's not much actual data STORED:
-----------------------------------------------------
# ceph df
RAW STORAGE:
     CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
     hdd       4.0 TiB     2.6 TiB     1.4 TiB      1.4 TiB         35.94
     TOTAL     4.0 TiB     2.6 TiB     1.4 TiB      1.4 TiB         35.94

POOLS:
     POOL                          ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
     rbd                            2     676 GiB     266.40k     707 GiB     23.42       771 GiB
     .rgw.root                      9     1.2 KiB           4     768 KiB         0       771 GiB
     default.rgw.control           10         0 B           8         0 B         0       771 GiB
     default.rgw.meta              11     1.2 KiB           8     1.3 MiB         0       771 GiB
     default.rgw.log               12         0 B         175         0 B         0       771 GiB
     default.rgw.buckets.index     13         0 B           1         0 B         0       771 GiB
     default.rgw.buckets.data      14     249 GiB      99.62k     753 GiB     24.57       771 GiB
-----------------------------------------------------
The main culprit here seems to be the default.rgw.buckets.data pool, but also the rbd pool contains thin images.

As in the case of Joe, the autoscaler seems to look at the "USED" space, not at the "STORED" bytes:
-----------------------------------------------------
  POOL                         SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE
  default.rgw.meta            1344k                3.0         4092G  0.0000                 1.0       8              on
  default.rgw.buckets.index      0                 3.0         4092G  0.0000                 1.0       8              on
  default.rgw.control            0                 3.0         4092G  0.0000                 1.0       8              on
  default.rgw.buckets.data   788.6G                3.0         4092G  0.5782                 1.0     128              on
  .rgw.root                  768.0k                3.0         4092G  0.0000                 1.0       8              on
  rbd                        710.8G                3.0         4092G  0.5212                 1.0      64              on
  default.rgw.log                0                 3.0         4092G  0.0000                 1.0       8              on
-----------------------------------------------------

This does seem like a bug to me. The warning actually fires on a cluster with 35 % raw usage, and things are mostly balanced.
Is there already a tracker entry on this?

Cheers,
	Oliver

On 2019-05-01 22:01, Joe Ryner wrote:
I think I have figured out the issue.

  POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  PG_NUM  NEW PG_NUM  AUTOSCALE
  images    28523G                3.0        68779G  1.2441                  1000              warn

My images are 28523G with a replication level 3 and have a total of 68779G in Raw Capacity.

  According to the documentation http://docs.ceph.com/docs/master/rados/operations/placement-groups/

"*SIZE* is the amount of data stored in the pool. *TARGET SIZE*, if present, is the amount of data the administrator has specified that they expect to eventually be stored in this pool. The system uses the larger of the two values for its calculation.

*RATE* is the multiplier for the pool that determines how much raw storage capacity is consumed. For example, a 3 replica pool will have a ratio of 3.0, while a k=4,m=2 erasure coded pool will have a ratio of 1.5.

*RAW CAPACITY* is the total amount of raw storage capacity on the OSDs that are responsible for storing this pool’s (and perhaps other pools’) data. *RATIO* is the ratio of that total capacity that this pool is consuming (i.e., ratio = size * rate / raw capacity)."

So ratio = "28523G * 3.0/68779G" = 1.2441x

So I'm oversubscribing by 1.2441x, thus the warning.

But ... looking at #ceph df

POOL         ID     STORED      OBJECTS     USED        %USED     MAX AVAIL

images        3     9.3 TiB       2.82M      28 TiB     57.94       6.7 TiB

I believe the 9.3TiB is the amount I have that is thinly provisioned vs a fully provisioned 28 TiB?

The raw capacity of the cluster is sitting at about 50% used.

Shouldn't the ratio be the amount STORED(from ceph df) * SIZE (from  ceph osd pool autoscale-status) / Raw Capacity, since ceph uses thin provisioning in rbd?

Otherwise, this ratio will only work for people who don't thin provision which goes against what ceph is doing with rbd

http://docs.ceph.com/docs/master/rbd/

On Wed, May 1, 2019 at 11:44 AM Joe Ryner <jryner@xxxxxxxx <mailto:jryner@xxxxxxxx>> wrote:

     I have found a little more information.
     When I turn off pg_autoscaler the warning goes away turn it back on and the warning comes back.

     I have ran the following:
     # ceph osd pool autoscale-status
      POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  PG_NUM  NEW PG_NUM  AUTOSCALE
      images    28523G                3.0        68779G  1.2441                  1000              warn
      locks     676.5M                3.0        68779G  0.0000                     8              warn
      rbd           0                 3.0        68779G  0.0000                     8              warn
      data          0                 3.0        68779G  0.0000                     8              warn
      metadata   3024k                3.0        68779G  0.0000                     8              warn

     # ceph df
     RAW STORAGE:
         CLASS     SIZE       AVAIL       USED        RAW USED     %RAW USED
         hdd       51 TiB      26 TiB      24 TiB       24 TiB         48.15
         ssd       17 TiB     8.5 TiB     8.1 TiB      8.1 TiB         48.69
         TOTAL     67 TiB      35 TiB      32 TiB       32 TiB         48.28

     POOLS:
         POOL         ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
         data          0         0 B           0         0 B         0       6.7 TiB
         metadata      1     6.3 KiB          21     3.0 MiB         0       6.7 TiB
         rbd           2         0 B           2         0 B         0       6.7 TiB
         images        3     9.3 TiB       2.82M      28 TiB     57.94       6.7 TiB
         locks         4     215 MiB         517     677 MiB         0       6.7 TiB

     It looks to me like it thinks the images pool no right in the autoscale-status.

     Below is a osd crush tree
     # ceph osd crush tree
     ID  CLASS WEIGHT   (compat) TYPE NAME
      -1       66.73337          root default
      -3       22.28214 22.28214     rack marack
      -8        7.27475  7.27475         host abacus
      19   hdd  1.81879  1.81879             osd.19
      20   hdd  1.81879  1.42563             osd.20
      21   hdd  1.81879  1.81879             osd.21
      50   hdd  1.81839  1.81839             osd.50
     -10        7.76500  6.67049         host gold
       7   hdd  0.86299  0.83659             osd.7
       9   hdd  0.86299  0.78972             osd.9
      10   hdd  0.86299  0.72031             osd.10
      14   hdd  0.86299  0.65315             osd.14
      15   hdd  0.86299  0.72586             osd.15
      22   hdd  0.86299  0.80528             osd.22
      23   hdd  0.86299  0.63741             osd.23
      24   hdd  0.86299  0.77718             osd.24
      25   hdd  0.86299  0.72499             osd.25
      -5        7.24239  7.24239         host hassium
       0   hdd  1.80800  1.52536             osd.0
       1   hdd  1.80800  1.65421             osd.1
      26   hdd  1.80800  1.65140             osd.26
      51   hdd  1.81839  1.81839             osd.51
      -2       21.30070 21.30070     rack marack2
     -12        7.76999  8.14474         host hamms
      27   ssd  0.86299  0.99367             osd.27
      28   ssd  0.86299  0.95961             osd.28
      29   ssd  0.86299  0.80768             osd.29
      30   ssd  0.86299  0.86893             osd.30
      31   ssd  0.86299  0.92583             osd.31
      32   ssd  0.86299  1.00227             osd.32
      33   ssd  0.86299  0.73099             osd.33
      34   ssd  0.86299  0.80766             osd.34
      35   ssd  0.86299  1.04811             osd.35
      -7        5.45636  5.45636         host parabola
       5   hdd  1.81879  1.81879             osd.5
      12   hdd  1.81879  1.81879             osd.12
      13   hdd  1.81879  1.81879             osd.13
      -6        2.63997  3.08183         host radium
       2   hdd  0.87999  1.05594             osd.2
       6   hdd  0.87999  1.10501             osd.6
      11   hdd  0.87999  0.92088             osd.11
      -9        5.43439  5.43439         host splinter
      16   hdd  1.80800  1.80800             osd.16
      17   hdd  1.81839  1.81839             osd.17
      18   hdd  1.80800  1.80800             osd.18
     -11       23.15053 23.15053     rack marack3
     -13        8.63300  8.98921         host helm
      36   ssd  0.86299  0.71931             osd.36
      37   ssd  0.86299  0.92601             osd.37
      38   ssd  0.86299  0.79585             osd.38
      39   ssd  0.86299  1.08521             osd.39
      40   ssd  0.86299  0.89500             osd.40
      41   ssd  0.86299  0.92351             osd.41
      42   ssd  0.86299  0.89690             osd.42
      43   ssd  0.86299  0.92480             osd.43
      44   ssd  0.86299  0.84467             osd.44
      45   ssd  0.86299  0.97795             osd.45
     -40        7.27515  7.89609         host samarium
      46   hdd  1.81879  1.90242             osd.46
      47   hdd  1.81879  1.86723             osd.47
      48   hdd  1.81879  1.93404             osd.48
      49   hdd  1.81879  2.19240             osd.49
      -4        7.24239  7.24239         host scandium
       3   hdd  1.80800  1.76680             osd.3
       4   hdd  1.80800  1.80800             osd.4
       8   hdd  1.80800  1.80800             osd.8
      52   hdd  1.81839  1.81839             osd.52

     Any ideas?

     On Wed, May 1, 2019 at 9:32 AM Joe Ryner <jryner@xxxxxxxx <mailto:jryner@xxxxxxxx>> wrote:

         Hi,

         I have an old ceph cluster and have upgraded recently from Luminous to Nautilus.  After converting to Nautilus I decided it was time to convert to bluestore.

         Before I converted the cluster was healthy but after I have a HEALTH_WARN

         #ceph health detail
         HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio
         POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_bytes
             Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit available storage by 1.244x due to target_size_bytes    0  on pools []
         POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_ratio
             Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit available storage by 1.244x due to target_size_ratio 0.000 on pools []

         I started with a target_size ratio of .85 on the images pool and reduced it to 0 to hopefully get the warning to go away.  The cluster seems to be running fine, I just can't figure out what the problem is and how to make the message go away.  I restarted the monitors this morning in hopes to fix it.  Anyone have any ideas?

         Thanks in advance

         --
         Joe Ryner
         Associate Director
         Center for the Application of Information Technologies (CAIT) - http://www.cait.org
         Western Illinois University - http://www.wiu.edu

         P: (309) 298-1804
         F: (309) 298-2806

     --
     Joe Ryner
     Associate Director
     Center for the Application of Information Technologies (CAIT) - http://www.cait.org
     Western Illinois University - http://www.wiu.edu

     P: (309) 298-1804
     F: (309) 298-2806

--
Joe Ryner
Associate Director
Center for the Application of Information Technologies (CAIT) - http://www.cait.org
Western Illinois University - http://www.wiu.edu

P: (309) 298-1804
F: (309) 298-2806

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Dieser Nachrichteninhalt wird auf Anfrage komplett heruntergeladen.

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com