Re: pg_autoscaler using uncompressed bytes as pool current total_bytes triggering false POOL_TARGET_SIZE_BYTES_OVERCOMMITTED warnings?

Christian Rohmann <christian.rohmann@xxxxxxxxx> · Fri, 21 Apr 2023 15:56:35 +0200

Hey ceph-users,

may I ask (nag) again about this issue?  I am wondering if anybody can 
confirm my observations?
I raised a bug https://tracker.ceph.com/issues/54136, but apart from the 
assignment to a
dev a while ago here was not response yet.

Maybe I am just holding it wrong, please someone enlighten me.

Thank you and with kind regards

Christian

On 02/02/2022 20:10, Christian Rohmann wrote:

Hey ceph-users,

I am debugging a mgr pg_autoscaler WARN which states a 
target_size_bytes on a pool would overcommit the available storage.
There is only one pool with value for  target_size_bytes (=5T) defined 
and that apparently would consume more than the available storage:

--- cut ---
# ceph health detail
HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes
[WRN] POOL_TARGET_SIZE_BYTES_OVERCOMMITTED: 1 subtrees have 
overcommitted pool target_size_bytes
    Pools ['backups', 'images', 'device_health_metrics', '.rgw.root', 
'redacted.rgw.control', 'redacted.rgw.meta', 'redacted.rgw.log', 
'redacted.rgw.otp', 'redacted.rgw.buckets.index', 
'redacted.rgw.buckets.data', 'redacted.rgw.buckets.non-ec'] overcommit 
available storage by 1.011x due to target_size_bytes 15.0T on pools 
['redacted.rgw.buckets.data'].
--- cut ---

But then looking at the actual usage it seems strange that 15T (5T * 3 
replicas) should not fit onto the remaining 122 TiB AVAIL:

--- cut ---
# ceph df detail
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd    293 TiB  122 TiB  171 TiB   171 TiB      58.44
TOTAL  293 TiB  122 TiB  171 TiB   171 TiB      58.44

--- POOLS ---
POOL                             ID  PGS   STORED   (DATA) (OMAP)   
OBJECTS  USED     (DATA)   (OMAP)   %USED  MAX AVAIL QUOTA OBJECTS  
QUOTA BYTES  DIRTY  USED COMPR  UNDER COMPR
backups                           1  1024   92 TiB   92 TiB  3.8 MiB   
28.11M  156 TiB  156 TiB   11 MiB  64.77     28 TiB N/A            
N/A            N/A      39 TiB      123 TiB
images                            2    64  1.7 TiB  1.7 TiB  249 KiB  
471.72k  5.2 TiB  5.2 TiB  748 KiB   5.81     28 TiB N/A            
N/A            N/A         0 B          0 B
device_health_metrics            19     1   82 MiB      0 B   82 
MiB       43  245 MiB      0 B  245 MiB      0     28 TiB 
N/A            N/A            N/A         0 B          0 B
.rgw.root                        21    32   23 KiB   23 KiB 0 B       
25  4.1 MiB  4.1 MiB      0 B      0     28 TiB N/A            
N/A            N/A         0 B          0 B
redacted.rgw.control             22    32      0 B      0 B 0 B        
8      0 B      0 B      0 B      0     28 TiB N/A            
N/A            N/A         0 B          0 B
redacted.rgw.meta                23    32  1.7 MiB  394 KiB  1.3 
MiB    1.38k  237 MiB  233 MiB  3.9 MiB      0     28 TiB 
N/A            N/A            N/A         0 B          0 B
redacted.rgw.log                 24    32   53 MiB  500 KiB   53 
MiB    7.60k  204 MiB   47 MiB  158 MiB      0     28 TiB 
N/A            N/A            N/A         0 B          0 B
redacted.rgw.otp                 25    32  5.2 KiB      0 B  5.2 
KiB        0   16 KiB      0 B   16 KiB      0     28 TiB 
N/A            N/A            N/A         0 B          0 B
redacted.rgw.buckets.index       26    32  1.2 GiB      0 B  1.2 
GiB    7.46k  3.5 GiB      0 B  3.5 GiB      0     28 TiB 
N/A            N/A            N/A         0 B          0 B
redacted.rgw.buckets.data        27   128  3.1 TiB  3.1 TiB 0 B    
3.53M  9.5 TiB  9.5 TiB      0 B  10.11     28 TiB N/A            
N/A            N/A         0 B          0 B
redacted.rgw.buckets.non-ec      28    32      0 B      0 B 0 B        
0      0 B      0 B      0 B      0     28 TiB N/A            
N/A            N/A         0 B          0 B
--- cut ---

I then looked at how those values are determined at 
https://github.com/ceph/ceph/blob/9f723519257eca039126a20aa6a2a7d2dbfb5dba/src/pybind/mgr/pg_autoscaler/module.py#L509.
Apparently "total_bytes" are compared with the capacity of the 
root_map. I added a debug line and found that the total in my cluster 
was already at:

  total=325511007759696

so in excess of 300 TiB - Looking at "ceph df" again this usage seems 
strange.

Looking at how this total is calculated at 
https://github.com/ceph/ceph/blob/9f723519257eca039126a20aa6a2a7d2dbfb5dba/src/pybind/mgr/pg_autoscaler/module.py#L441,
you see that the larger value (max) of "actual_raw_used" vs. 
"target_bytes*raw_used_rate" is determined and then summed up.

I dumped the values for all pools my cluster with yet another line of 
debug code:

---cut ---
pool_id 1 - actual_raw_used=303160109187420.0, target_bytes=0 
raw_used_rate=3.0
pool_id 2 - actual_raw_used=5714098884702.0, target_bytes=0 
raw_used_rate=3.0
pool_id 19 - actual_raw_used=256550760.0, target_bytes=0 raw_used_rate=3.0
pool_id 21 - actual_raw_used=71433.0, target_bytes=0 raw_used_rate=3.0
pool_id 22 - actual_raw_used=0.0, target_bytes=0 raw_used_rate=3.0
pool_id 23 - actual_raw_used=5262798.0, target_bytes=0 raw_used_rate=3.0
pool_id 24 - actual_raw_used=162299940.0, target_bytes=0 raw_used_rate=3.0
pool_id 25 - actual_raw_used=16083.0, target_bytes=0 raw_used_rate=3.0
pool_id 26 - actual_raw_used=3728679936.0, target_bytes=0 
raw_used_rate=3.0
pool_id 27 - actual_raw_used=10035209699328.0, 
target_bytes=5497558138880 raw_used_rate=3.0
pool_id 28 - actual_raw_used=0.0, target_bytes=0 raw_used_rate=3.0
--- cut ---

All values but those of pool_id 1 (backups) make sense. For backups 
it's just reporting a MUCH larger actual_raw_used value than what is 
shown via ceph df.
The only difference of that pool compared to the others is the enabled 
compression:

--- cut ---
# ceph osd pool get backups compression_mode
compression_mode: aggressive
--- cut ---

Apparently there already was a similar issue 
(https://tracker.ceph.com/issues/41567) with a resulting commit 
(https://github.com/ceph/ceph/commit/dd6e752826bc762095be4d276e3c1b8d31293eb0) 

changing which from "bytes_used" to the "stored" field for 
"pool_logical_used".

But how does that take compressed (away) data into account? Does 
"bytes_used" count all the "stored" bytes, summing up all uncompressed 
bytes for pools with compression?
This surely must be a bug then, as those bytes are not really 
"actual_raw_used".

I was about to raise a bug, but I wanted to ask here on the ML first 
if I misunderstood the mechanisms at play here.
Thanks and with kind regards,

Christian

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx