Re: POOL_TARGET_SIZE_BYTES_OVERCOMMITTED

Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> · Thu, 12 Sep 2019 23:34:08 +0200

Dear Cephalopodians,

I can confirm the same problem described by Joe Ryner in 14.2.2. I'm also getting (in a small test setup):
-----------------------------------------------------
# ceph health detail
HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio
POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_bytes
    Pools ['rbd', '.rgw.root', 'default.rgw.control', 'default.rgw.meta', 'default.rgw.log', 'default.rgw.buckets.index', 'default.rgw.buckets.data'] overcommit available storage by 1.068x due to target_size_bytes    0  on pools []
POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_ratio
    Pools ['rbd', '.rgw.root', 'default.rgw.control', 'default.rgw.meta', 'default.rgw.log', 'default.rgw.buckets.index', 'default.rgw.buckets.data'] overcommit available storage by 1.068x due to target_size_ratio 0.000 on pools []
-----------------------------------------------------

However, there's not much actual data STORED:
-----------------------------------------------------
# ceph df
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED 
    hdd       4.0 TiB     2.6 TiB     1.4 TiB      1.4 TiB         35.94 
    TOTAL     4.0 TiB     2.6 TiB     1.4 TiB      1.4 TiB         35.94 

POOLS:
    POOL                          ID     STORED      OBJECTS     USED        %USED     MAX AVAIL 
    rbd                            2     676 GiB     266.40k     707 GiB     23.42       771 GiB 
    .rgw.root                      9     1.2 KiB           4     768 KiB         0       771 GiB 
    default.rgw.control           10         0 B           8         0 B         0       771 GiB 
    default.rgw.meta              11     1.2 KiB           8     1.3 MiB         0       771 GiB 
    default.rgw.log               12         0 B         175         0 B         0       771 GiB 
    default.rgw.buckets.index     13         0 B           1         0 B         0       771 GiB 
    default.rgw.buckets.data      14     249 GiB      99.62k     753 GiB     24.57       771 GiB
-----------------------------------------------------
The main culprit here seems to be the default.rgw.buckets.data pool, but also the rbd pool contains thin images. 

As in the case of Joe, the autoscaler seems to look at the "USED" space, not at the "STORED" bytes:
-----------------------------------------------------
 POOL                         SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE 
 default.rgw.meta            1344k                3.0         4092G  0.0000                 1.0       8              on        
 default.rgw.buckets.index      0                 3.0         4092G  0.0000                 1.0       8              on        
 default.rgw.control            0                 3.0         4092G  0.0000                 1.0       8              on        
 default.rgw.buckets.data   788.6G                3.0         4092G  0.5782                 1.0     128              on        
 .rgw.root                  768.0k                3.0         4092G  0.0000                 1.0       8              on        
 rbd                        710.8G                3.0         4092G  0.5212                 1.0      64              on        
 default.rgw.log                0                 3.0         4092G  0.0000                 1.0       8              on
-----------------------------------------------------

This does seem like a bug to me. The warning actually fires on a cluster with 35 % raw usage, and things are mostly balanced. 
Is there already a tracker entry on this? 

Cheers,
	Oliver

On 2019-05-01 22:01, Joe Ryner wrote:
> I think I have figured out the issue.
> 
>  POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  PG_NUM  NEW PG_NUM  AUTOSCALE 
>  images    28523G                3.0        68779G  1.2441                  1000              warn 
> 
> My images are 28523G with a replication level 3 and have a total of 68779G in Raw Capacity.
> 
>  According to the documentation http://docs.ceph.com/docs/master/rados/operations/placement-groups/  
> 
> "*SIZE* is the amount of data stored in the pool. *TARGET SIZE*, if present, is the amount of data the administrator has specified that they expect to eventually be stored in this pool. The system uses the larger of the two values for its calculation.
> 
> *RATE* is the multiplier for the pool that determines how much raw storage capacity is consumed. For example, a 3 replica pool will have a ratio of 3.0, while a k=4,m=2 erasure coded pool will have a ratio of 1.5.
> 
> *RAW CAPACITY* is the total amount of raw storage capacity on the OSDs that are responsible for storing this pool’s (and perhaps other pools’) data. *RATIO* is the ratio of that total capacity that this pool is consuming (i.e., ratio = size * rate / raw capacity)."
> 
> So ratio = "28523G * 3.0/68779G" = 1.2441x
> 
> 
> So I'm oversubscribing by 1.2441x, thus the warning. 
> 
> 
> But ... looking at #ceph df
> 
> POOL         ID     STORED      OBJECTS     USED        %USED     MAX AVAIL 
> 
> images        3     9.3 TiB       2.82M      28 TiB     57.94       6.7 TiB
> 
> 
> I believe the 9.3TiB is the amount I have that is thinly provisioned vs a fully provisioned 28 TiB?
> 
> The raw capacity of the cluster is sitting at about 50% used.
> 
> 
> Shouldn't the ratio be the amount STORED(from ceph df) * SIZE (from  ceph osd pool autoscale-status) / Raw Capacity, since ceph uses thin provisioning in rbd?
> 
> Otherwise, this ratio will only work for people who don't thin provision which goes against what ceph is doing with rbd
> 
> http://docs.ceph.com/docs/master/rbd/
> 
> 
> 
> 
> 
> On Wed, May 1, 2019 at 11:44 AM Joe Ryner <jryner@xxxxxxxx <mailto:jryner@xxxxxxxx>> wrote:
> 
>     I have found a little more information.
>     When I turn off pg_autoscaler the warning goes away turn it back on and the warning comes back.
> 
>     I have ran the following:
>     # ceph osd pool autoscale-status
>      POOL        SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  PG_NUM  NEW PG_NUM  AUTOSCALE 
>      images    28523G                3.0        68779G  1.2441                  1000              warn      
>      locks     676.5M                3.0        68779G  0.0000                     8              warn      
>      rbd           0                 3.0        68779G  0.0000                     8              warn      
>      data          0                 3.0        68779G  0.0000                     8              warn      
>      metadata   3024k                3.0        68779G  0.0000                     8              warn      
> 
>     # ceph df
>     RAW STORAGE:
>         CLASS     SIZE       AVAIL       USED        RAW USED     %RAW USED 
>         hdd       51 TiB      26 TiB      24 TiB       24 TiB         48.15 
>         ssd       17 TiB     8.5 TiB     8.1 TiB      8.1 TiB         48.69 
>         TOTAL     67 TiB      35 TiB      32 TiB       32 TiB         48.28 
>      
>     POOLS:
>         POOL         ID     STORED      OBJECTS     USED        %USED     MAX AVAIL 
>         data          0         0 B           0         0 B         0       6.7 TiB 
>         metadata      1     6.3 KiB          21     3.0 MiB         0       6.7 TiB 
>         rbd           2         0 B           2         0 B         0       6.7 TiB 
>         images        3     9.3 TiB       2.82M      28 TiB     57.94       6.7 TiB 
>         locks         4     215 MiB         517     677 MiB         0       6.7 TiB 
> 
> 
>     It looks to me like it thinks the images pool no right in the autoscale-status.
> 
>     Below is a osd crush tree
>     # ceph osd crush tree
>     ID  CLASS WEIGHT   (compat) TYPE NAME             
>      -1       66.73337          root default          
>      -3       22.28214 22.28214     rack marack       
>      -8        7.27475  7.27475         host abacus   
>      19   hdd  1.81879  1.81879             osd.19    
>      20   hdd  1.81879  1.42563             osd.20    
>      21   hdd  1.81879  1.81879             osd.21    
>      50   hdd  1.81839  1.81839             osd.50    
>     -10        7.76500  6.67049         host gold     
>       7   hdd  0.86299  0.83659             osd.7     
>       9   hdd  0.86299  0.78972             osd.9     
>      10   hdd  0.86299  0.72031             osd.10    
>      14   hdd  0.86299  0.65315             osd.14    
>      15   hdd  0.86299  0.72586             osd.15    
>      22   hdd  0.86299  0.80528             osd.22    
>      23   hdd  0.86299  0.63741             osd.23    
>      24   hdd  0.86299  0.77718             osd.24    
>      25   hdd  0.86299  0.72499             osd.25    
>      -5        7.24239  7.24239         host hassium  
>       0   hdd  1.80800  1.52536             osd.0     
>       1   hdd  1.80800  1.65421             osd.1     
>      26   hdd  1.80800  1.65140             osd.26    
>      51   hdd  1.81839  1.81839             osd.51    
>      -2       21.30070 21.30070     rack marack2      
>     -12        7.76999  8.14474         host hamms    
>      27   ssd  0.86299  0.99367             osd.27    
>      28   ssd  0.86299  0.95961             osd.28    
>      29   ssd  0.86299  0.80768             osd.29    
>      30   ssd  0.86299  0.86893             osd.30    
>      31   ssd  0.86299  0.92583             osd.31    
>      32   ssd  0.86299  1.00227             osd.32    
>      33   ssd  0.86299  0.73099             osd.33    
>      34   ssd  0.86299  0.80766             osd.34    
>      35   ssd  0.86299  1.04811             osd.35    
>      -7        5.45636  5.45636         host parabola 
>       5   hdd  1.81879  1.81879             osd.5     
>      12   hdd  1.81879  1.81879             osd.12    
>      13   hdd  1.81879  1.81879             osd.13    
>      -6        2.63997  3.08183         host radium   
>       2   hdd  0.87999  1.05594             osd.2     
>       6   hdd  0.87999  1.10501             osd.6     
>      11   hdd  0.87999  0.92088             osd.11    
>      -9        5.43439  5.43439         host splinter 
>      16   hdd  1.80800  1.80800             osd.16    
>      17   hdd  1.81839  1.81839             osd.17    
>      18   hdd  1.80800  1.80800             osd.18    
>     -11       23.15053 23.15053     rack marack3      
>     -13        8.63300  8.98921         host helm     
>      36   ssd  0.86299  0.71931             osd.36    
>      37   ssd  0.86299  0.92601             osd.37    
>      38   ssd  0.86299  0.79585             osd.38    
>      39   ssd  0.86299  1.08521             osd.39    
>      40   ssd  0.86299  0.89500             osd.40    
>      41   ssd  0.86299  0.92351             osd.41    
>      42   ssd  0.86299  0.89690             osd.42    
>      43   ssd  0.86299  0.92480             osd.43    
>      44   ssd  0.86299  0.84467             osd.44    
>      45   ssd  0.86299  0.97795             osd.45    
>     -40        7.27515  7.89609         host samarium 
>      46   hdd  1.81879  1.90242             osd.46    
>      47   hdd  1.81879  1.86723             osd.47    
>      48   hdd  1.81879  1.93404             osd.48    
>      49   hdd  1.81879  2.19240             osd.49    
>      -4        7.24239  7.24239         host scandium 
>       3   hdd  1.80800  1.76680             osd.3     
>       4   hdd  1.80800  1.80800             osd.4     
>       8   hdd  1.80800  1.80800             osd.8     
>      52   hdd  1.81839  1.81839             osd.52    
> 
> 
>     Any ideas?
> 
> 
> 
> 
> 
>     On Wed, May 1, 2019 at 9:32 AM Joe Ryner <jryner@xxxxxxxx <mailto:jryner@xxxxxxxx>> wrote:
> 
>         Hi,
> 
>         I have an old ceph cluster and have upgraded recently from Luminous to Nautilus.  After converting to Nautilus I decided it was time to convert to bluestore.
> 
>         Before I converted the cluster was healthy but after I have a HEALTH_WARN
> 
>         #ceph health detail
>         HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio
>         POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_bytes
>             Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit available storage by 1.244x due to target_size_bytes    0  on pools []
>         POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_ratio
>             Pools ['data', 'metadata', 'rbd', 'images', 'locks'] overcommit available storage by 1.244x due to target_size_ratio 0.000 on pools []
> 
>         I started with a target_size ratio of .85 on the images pool and reduced it to 0 to hopefully get the warning to go away.  The cluster seems to be running fine, I just can't figure out what the problem is and how to make the message go away.  I restarted the monitors this morning in hopes to fix it.  Anyone have any ideas?
> 
>         Thanks in advance
> 
> 
>         -- 
>         Joe Ryner
>         Associate Director
>         Center for the Application of Information Technologies (CAIT) - http://www.cait.org
>         Western Illinois University - http://www.wiu.edu
> 
> 
>         P: (309) 298-1804
>         F: (309) 298-2806
> 
> 
> 
>     -- 
>     Joe Ryner
>     Associate Director
>     Center for the Application of Information Technologies (CAIT) - http://www.cait.org
>     Western Illinois University - http://www.wiu.edu
> 
> 
>     P: (309) 298-1804
>     F: (309) 298-2806
> 
> 
> 
> -- 
> Joe Ryner
> Associate Director
> Center for the Application of Information Technologies (CAIT) - http://www.cait.org
> Western Illinois University - http://www.wiu.edu
> 
> 
> P: (309) 298-1804
> F: (309) 298-2806
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com