Hi Nathan, Thu, 24 Oct 2019 10:59:55 -0400 Nathan Fish <lordcirth@xxxxxxxxx> ==> Lars Täuber <taeuber@xxxxxxx> : > Ah, I see! The BIAS reflects the number of placement groups it should > create. Since cephfs metadata pools are usually very small, but have > many objects and high IO, the autoscaler gives them 4x the number of > placement groups that it would normally give for that amount of data. > ah ok, I understand. > So, your cephfs_data is set to a ratio of 0.9, and cephfs_metadata to > 0.3? Are the two pools using entirely different device classes, so > they are not sharing space? Yes, the metadata is on SSDs and the data on HDDs. > Anyway, I see that your overcommit is only "1.031x". So if you set > cephfs_data to 0.85, it should go away. This is not the case. I set the target_ratio to 0.7 and get this: POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE cephfs_metadata 15736M 3.0 2454G 0.0188 0.3000 4.0 256 on cephfs_data 122.2T 1.5 165.4T 1.1085 0.7000 1.0 1024 on The ratio seems to have nothing to do with the target_ratio but the SIZE and the RAW_CAPACITY. Because the pool is still getting more data the SIZE increases and therefore the RATIO increases. The RATIO seems to be calculated by this formula RATIO = SIZE * RATE / RAW_CAPACITY. This is what I don't understand. The data in the cephfs_data pool seems to need more space than the raw capacity of the cluster provides. Hence the situation is called "overcommitment". But why is this only the case when the autoscaler is active? Thanks Lars > > On Thu, Oct 24, 2019 at 10:09 AM Lars Täuber <taeuber@xxxxxxx> wrote: > > > > Thanks Nathan for your answer, > > > > but I set the the Target Ratio to 0.9. It is the cephfs_data pool that makes the troubles. > > > > The 4.0 is the BIAS from the cephfs_metadata pool. This "BIAS" is not explained on the page linked below. So I don't know its meaning. > > > > How can be a pool overcommited when it is the only pool on a set of OSDs? > > > > Best regards, > > Lars > > > > Thu, 24 Oct 2019 09:39:51 -0400 > > Nathan Fish <lordcirth@xxxxxxxxx> ==> Lars Täuber <taeuber@xxxxxxx> : > > > The formatting is mangled on my phone, but if I am reading it correctly, > > > you have set Target Ratio to 4.0. This means you have told the balancer > > > that this pool will occupy 4x the space of your whole cluster, and to > > > optimize accordingly. This is naturally a problem. Setting it to 0 will > > > clear the setting and allow the autobalancer to work. > > > > > > On Thu., Oct. 24, 2019, 5:18 a.m. Lars Täuber, <taeuber@xxxxxxx> wrote: > > > > > > > This question is answered here: > > > > https://ceph.io/rados/new-in-nautilus-pg-merging-and-autotuning/ > > > > > > > > But it tells me that there is more data stored in the pool than the raw > > > > capacity provides (taking the replication factor RATE into account) hence > > > > the RATIO being above 1.0 . > > > > > > > > How comes this is the case? - Data is stored outside of the pool? > > > > How comes this is only the case when the autoscaler is active? > > > > > > > > Thanks > > > > Lars > > > > > > > > > > > > Thu, 24 Oct 2019 10:36:52 +0200 > > > > Lars Täuber <taeuber@xxxxxxx> ==> ceph-users@xxxxxxx : > > > > > My question requires too complex an answer. > > > > > So let me ask a simple question: > > > > > > > > > > What does the SIZE of "osd pool autoscale-status" tell/mean/comes from? > > > > > > > > > > Thanks > > > > > Lars > > > > > > > > > > Wed, 23 Oct 2019 14:28:10 +0200 > > > > > Lars Täuber <taeuber@xxxxxxx> ==> ceph-users@xxxxxxx : > > > > > > Hello everybody! > > > > > > > > > > > > What does this mean? > > > > > > > > > > > > health: HEALTH_WARN > > > > > > 1 subtrees have overcommitted pool target_size_bytes > > > > > > 1 subtrees have overcommitted pool target_size_ratio > > > > > > > > > > > > and what does it have to do with the autoscaler? > > > > > > When I deactivate the autoscaler the warning goes away. > > > > > > > > > > > > > > > > > > $ ceph osd pool autoscale-status > > > > > > POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO > > > > TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE > > > > > > cephfs_metadata 15106M 3.0 2454G 0.0180 > > > > 0.3000 4.0 256 on > > > > > > cephfs_data 113.6T 1.5 165.4T 1.0306 > > > > 0.9000 1.0 512 on > > > > > > > > > > > > > > > > > > $ ceph health detail > > > > > > HEALTH_WARN 1 subtrees have overcommitted pool target_size_bytes; 1 > > > > subtrees have overcommitted pool target_size_ratio > > > > > > POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted > > > > pool target_size_bytes > > > > > > Pools ['cephfs_data'] overcommit available storage by 1.031x due > > > > to target_size_bytes 0 on pools [] > > > > > > POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted > > > > pool target_size_ratio > > > > > > Pools ['cephfs_data'] overcommit available storage by 1.031x due > > > > to target_size_ratio 0.900 on pools ['cephfs_data'] > > > > > > > > > > > > > > > > > > Thanks > > > > > > Lars > > > > > > _______________________________________________ > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > -- Informationstechnologie Berlin-Brandenburgische Akademie der Wissenschaften Jägerstraße 22-23 10117 Berlin Tel.: +49 30 20370-352 http://www.bbaw.de _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx