Re: subtrees have overcommitted (target_size_bytes / target_size_ratio)

Lars Täuber <taeuber@xxxxxxx> · Fri, 25 Oct 2019 07:42:58 +0200

Hi Nathan,

Thu, 24 Oct 2019 10:59:55 -0400
Nathan Fish <lordcirth@xxxxxxxxx> ==> Lars Täuber <taeuber@xxxxxxx> :
> Ah, I see! The BIAS reflects the number of placement groups it should
> create. Since cephfs metadata pools are usually very small, but have
> many objects and high IO, the autoscaler gives them 4x the number of
> placement groups that it would normally give for that amount of data.
> 
ah ok, I understand.

> So, your cephfs_data is set to a ratio of 0.9, and cephfs_metadata to
> 0.3? Are the two pools using entirely different device classes, so
> they are not sharing space?

Yes, the metadata is on SSDs and the data on HDDs.

> Anyway, I see that your overcommit is only "1.031x". So if you set
> cephfs_data to 0.85, it should go away.

This is not the case. I set the target_ratio to 0.7 and get this:

 POOL               SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE 
 cephfs_metadata  15736M                3.0         2454G  0.0188        0.3000   4.0     256              on        
 cephfs_data      122.2T                1.5        165.4T  1.1085        0.7000   1.0    1024              on        

The ratio seems to have nothing to do with the target_ratio but the SIZE and the RAW_CAPACITY.
Because the pool is still getting more data the SIZE increases and therefore the RATIO increases.
The RATIO seems to be calculated by this formula
RATIO = SIZE * RATE / RAW_CAPACITY.

This is what I don't understand. The data in the cephfs_data pool seems to need more space than the raw capacity of the cluster provides. Hence the situation is called "overcommitment".

But why is this only the case when the autoscaler is active?

Thanks
Lars

> 
> On Thu, Oct 24, 2019 at 10:09 AM Lars Täuber <taeuber@xxxxxxx> wrote:
> >
> > Thanks Nathan for your answer,
> >
> > but I set the the Target Ratio to 0.9. It is the cephfs_data pool that makes the troubles.
> >
> > The 4.0 is the BIAS from the cephfs_metadata pool. This "BIAS" is not explained on the page linked below. So I don't know its meaning.
> >
> > How can be a pool overcommited when it is the only pool on a set of OSDs?
> >
> > Best regards,
> > Lars
> >
> > Thu, 24 Oct 2019 09:39:51 -0400
> > Nathan Fish <lordcirth@xxxxxxxxx> ==> Lars Täuber <taeuber@xxxxxxx> :  
> > > The formatting is mangled on my phone, but if I am reading it correctly,
> > > you have set Target Ratio to 4.0. This means you have told the balancer
> > > that this pool will occupy 4x the space of your whole cluster, and to
> > > optimize accordingly. This is naturally a problem. Setting it to 0 will
> > > clear the setting and allow the autobalancer to work.
> > >
> > > On Thu., Oct. 24, 2019, 5:18 a.m. Lars Täuber, <taeuber@xxxxxxx> wrote:
> > >  
> > > > This question is answered here:
> > > > https://ceph.io/rados/new-in-nautilus-pg-merging-and-autotuning/
> > > >
> > > > But it tells me that there is more data stored in the pool than the raw
> > > > capacity provides (taking the replication factor RATE into account) hence
> > > > the RATIO being above 1.0 .
> > > >
> > > > How comes this is the case? - Data is stored outside of the pool?
> > > > How comes this is only the case when the autoscaler is active?
> > > >
> > > > Thanks
> > > > Lars
> > > >
> > > >
> > > > Thu, 24 Oct 2019 10:36:52 +0200
> > > > Lars Täuber <taeuber@xxxxxxx> ==> ceph-users@xxxxxxx :  
> > > > > My question requires too complex an answer.
> > > > > So let me ask a simple question:
> > > > >
> > > > > What does the SIZE of "osd pool autoscale-status" tell/mean/comes from?
> > > > >
> > > > > Thanks
> > > > > Lars
> > > > >
> > > > > Wed, 23 Oct 2019 14:28:10 +0200
> > > > > Lars Täuber <taeuber@xxxxxxx> ==> ceph-users@xxxxxxx :  
> > > > > > Hello everybody!
> > > > > >
> > > > > > What does this mean?
> > > > > >
> > > > > >     health: HEALTH_WARN
> > > > > >             1 subtrees have overcommitted pool target_size_bytes
> > > > > >             1 subtrees have overcommitted pool target_size_ratio
> > > > > >
> > > > > > and what does it have to do with the autoscaler?
> > > > > > When I deactivate the autoscaler the warning goes away.
> > > > > >
> > > > > >
> > > > > > $ ceph osd pool autoscale-status
> > > > > >  POOL               SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  
> > > > TARGET RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  
> > > > > >  cephfs_metadata  15106M                3.0         2454G  0.0180  
> > > >   0.3000   4.0     256              on  
> > > > > >  cephfs_data      113.6T                1.5        165.4T  1.0306  
> > > >   0.9000   1.0     512              on  
> > > > > >
> > > > > >
> > > > > > $ ceph health detail
> > > > > > HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1  
> > > > subtrees have overcommitted pool target_size_ratio  
> > > > > > POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted  
> > > > pool target_size_bytes  
> > > > > >     Pools ['cephfs_data'] overcommit available storage by 1.031x due  
> > > > to target_size_bytes    0  on pools []  
> > > > > > POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted  
> > > > pool target_size_ratio  
> > > > > >     Pools ['cephfs_data'] overcommit available storage by 1.031x due  
> > > > to target_size_ratio 0.900 on pools ['cephfs_data']  
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Lars
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx  
> > > > >  

-- 
                            Informationstechnologie
Berlin-Brandenburgische Akademie der Wissenschaften
Jägerstraße 22-23                      10117 Berlin
Tel.: +49 30 20370-352           http://www.bbaw.de
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx