Re: Pool with ghost used space

Joao Victor Rodrigues Soares <jvsoares@binario.cloud> · Mon, 11 Apr 2022 22:12:43 -0300

No bench objects present..

The size of both pools is 2.

The pool size is 2. The value USED (column 5) 143TB is just one copy, am I
right?. At the end of the line of ceph df detail is shown the RAW USED
(last column): 286TB (value for 2 copies).

I am aware of shared osds by both pools, but my point is one of the pools
is showing a value much higher than the expected value allocated.

Em seg., 11 de abr. de 2022 às 21:21, Anthony D'Atri <
anthony.datri@xxxxxxxxx> escreveu:

> Any chance there are `rados bench` artifacts?
>
> rados ls -p volumes-dr | egrep ‘^bench’
>
> I suspect though that in part you may be confused by raw vs used.
>
> In your volumes-dr pool, 143TB is the *Raw* space used.  If you’re doing
> 3R, the numbers would be closer to aligning.
>
> Also remember to account for any of these pools sharing the same OSDs.
>
> > On Apr 11, 2022, at 4:28 PM, Joao Victor Rodrigues Soares
> <jvsoares@binario.cloud> wrote:
> >
> > Hi everybody,
> >
> >
> > We have a CEPH Luminous cluster with 184 SSD OSDs. About 1 year ago we
> > noticed an abnormal growth in one of the cluster pools.
> > This pool is configured with a mirror feature to another CEPH cluster in
> > another datacenter. Below are the consumption of the two main pools.
> >
> > #PRIMARY CLUSTER
> > [root@ceph01 ~]# ceph df detail
> > GLOBAL:
> >    SIZE       AVAIL      RAW USED     %RAW USED     OBJECTS
> >    659TiB     240TiB       419TiB         63.60      43.34M
> > POOLS:
> >    NAME           ID     QUOTA OBJECTS     QUOTA BYTES     USED
> > %USED     MAX AVAIL     OBJECTS      DIRTY       READ        WRITE
> > RAW USED
> >    images-dr      8      N/A               N/A             1.24TiB
> > 6.42       18.2TiB       163522     163.52k     42.6GiB      247MiB
> > 3.73TiB
> >    volumes        11     N/A               N/A             59.1TiB
> > 68.46       27.2TiB     18945218      18.95M     4.81GiB     4.16GiB
> > 118TiB
> >    volumes-dr     12     N/A               N/A              143TiB
> > 83.99       27.2TiB     22108005      22.11M     1.84GiB      918MiB
> > 286TiB
> >
> > To verify the actual consumption of images within the pools, we run the
> rbd
> > diff command within the pool and then add up all the results.
> >
> > for j in $(rbd ls volumes)
> > do
> > i=$((i+1))
> > size=$(rbd diff volumes/$j | awk '{ SUM += $2 } END { print
> > SUM/1024/1024/1024 " GB" }')
> > echo "$j;$size" >> /var/lib/report-volumes/`date +%F`-volumes.txt
> > done
> >
> > In the "volumes" pool, we got a value of 56,455.43GB (56TB) - a value
> close
> > to that shown by the ceph df command (59.1TiB).
> >
> > for j in $(rbd ls volumes-dr)
> > do
> > i=$((i+1))
> > size=$(rbd diff volumes-dr/$j | awk '{ SUM += $2 } END { print
> > SUM/1024/1024/1024 " GB" }')
> > echo "$j;$size" >> /var/lib/report-volumes/`date +%F`-volumes.txt
> > done
> >
> > In the "volumes-dr" pool, we got the value of 40,726.51 (38TB) - a much
> > lower value than the one shown by the ceph df command (143TiB)
> >
> > Another feature of these two pools is that daily snapshots of all images
> > are taken and each image has a retention period (daily, weekly or
> monthly)
> > I thought this anomaly could be something related to the snapshots, but
> we
> > have already purged all the snapshots without significant reflections on
> > the pools.
> > I've already searched forums about unclaimed space, but haven't found
> > anything concrete.
> >
> > As for the mirrored pool in the DR datacenter, the value shown is a
> little
> > more real with the one obtained with the rbd diff - 56.5TiB.
> > We use the "pool" type mirror and both the source and the destination
> > currently have the same amount of images: 223
> >
> > #CLUSTER DR
> > [root@ceph-dr01 ~]# ceph df detail
> > GLOBAL:
> >    SIZE       AVAIL       RAW USED     %RAW USED     OBJECTS
> >    217TiB     97.6TiB       119TiB         54.98      16.73M
> > POOLS:
> >    NAME           ID     QUOTA OBJECTS     QUOTA BYTES     USED
> > %USED     MAX AVAIL     OBJECTS      DIRTY       READ        WRITE
> > RAW USED
> >    images-dr      1      N/A               N/A             1.37TiB
> > 6.89       18.5TiB       179953     179.95k      390MiB      198MiB
> > 4.11TiB
> >    volumes-dr     3      N/A               N/A             56.5TiB
> > 67.03       27.8TiB     16548170      16.55M     23.2GiB     59.0GiB
> > 113TiB
> >
> >
> > Other infrastructure information:
> > 4 virtualized monitors on CentOS 7.9.2009 (Core)
> >
> > 10 storage nodes (99 osds) with CentOS 7.9.2009 and Ceph 12.2.12
> > 8 storage nodes (84 osds) with CentOS 7.9.2009 and Ceph 12.2.13
> >
> > [root@ceph01]# ceph versions
> > {
> >    "mon": {
> >        "ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
> > luminous (stable)": 4
> >    },
> >    "mgr": {
> >        "ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
> > luminous (stable)": 4
> >    },
> >    "osd": {
> >        "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
> > luminous (stable)": 99,
> >        "ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
> > luminous (stable)": 84
> >    },
> >    "mds": {},
> >    "rbd-mirror": {
> >        "ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
> > luminous (stable)": 1
> >    },
> >    "overall": {
> >        "ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
> > luminous (stable)": 99,
> >        "ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
> > luminous (stable)": 93
> >    }
> > }
> >
> > Another information is that apparently this anomaly started after the
> > inclusion of the last 4 storage nodes that had disks of different sizes -
> > 3.8TB (the other 14 storage nodes are 4TB disks). But at the same time I
> > think if the disks were the problem then the other pool would also be
> > affected.
> >
> > Has anyone ever faced such a situation?
> >
> > João Victor Soares.
> > Binario Cloud
> >
> > --
> > *Aviso: esta mensagem é destinada exclusivamente para a(s) pessoa(s) a
> quem
> > é dirigida, podendo conter informação confidencial e legalmente
> protegida.
> > Se você não for o destinatário, desde já fica notificado de abster-se a
> > divulgar, copiar, distribuir, examinar ou, de qualquer forma, utilizar a
> > informação contida nesta mensagem, por ser ilegal. Caso tenha recebido
> esta
> > mensagem por engano, pedimos que responda, informando o acontecido.*
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>

-- 
*Aviso: esta mensagem é destinada exclusivamente para a(s) pessoa(s) a quem 
é dirigida, podendo conter informação confidencial e legalmente protegida. 
Se você não for o destinatário, desde já fica notificado de abster-se a 
divulgar, copiar, distribuir, examinar ou, de qualquer forma, utilizar a 
informação contida nesta mensagem, por ser ilegal. Caso tenha recebido esta 
mensagem por engano, pedimos que responda, informando o acontecido.*
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx