Re: RBD Object-Map Usuage incorrect

Ilya Dryomov <idryomov@xxxxxxxxx> · Thu, 12 Dec 2019 22:51:06 +0100

On Thu, Dec 12, 2019 at 9:12 AM Ashley Merrick <singapore@xxxxxxxxxxxxxx> wrote:
>
> Due to the recent 5.3.x kernel having support for Object-Map and other features required in KRBD I have now enabled object-map,fast-diff on some RBD images with CEPH (14.2.5), I have rebuilt the object map using "rbd object-map rebuild"
>
> However for some RBD images, the Provisioned/Total Provisioned then listed in the Ceph MGR for some images is the full RBD size and not the true size reflected in a VM using df -h, I have discard enabled and have run fstrim but I know that for example a 20TB RBD has never gone above the current 9TB shown in df -h but in CEPH MGR shows as 20TB under Provisioned/Total Provisioned.
>
> Not sure if I am hitting a bug? Or if this is expected behavior?

Unless you know *exactly* what the filesystem is doing in your case and
see an inconsistency, this is expected.

If you are interested, here is an example:

$ rbd create --size 1G img
$ sudo rbd map img
/dev/rbd0
$ sudo mkfs.ext4 /dev/rbd0
$ sudo mount /dev/rbd0 /mnt
$ df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd0       976M  2.6M  907M   1% /mnt
$ rbd du img
NAME PROVISIONED USED
img        1 GiB 60 MiB
$ ceph df | grep -B1 rbd
POOL ID STORED OBJECTS USED   %USED MAX AVAIL
rbd   1 33 MiB      20 33 MiB     0  1013 GiB

After I create a big file, almost the entire image is shown as used:

$ dd if=/dev/zero of=/mnt/file bs=1M count=900
$ df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd0       976M  903M  6.2M 100% /mnt
$ rbd du img
NAME PROVISIONED USED
img        1 GiB 956 MiB
$ ceph df | grep -B1 rbd
POOL ID STORED  OBJECTS USED    %USED MAX AVAIL
rbd   1 933 MiB     248 933 MiB  0.09  1012 GiB

Now if I carefully punch out most of that file, leaving one page in
each megabyte, and run fstrim:

$ for ((i = 0; i < 900; i++)); do fallocate -p -n -o $((i * 2**20)) -l
$((2**20 - 4096)) /mnt/file; done
$ sudo fstrim /mnt
$ df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd0       976M  6.1M  903M   1% /mnt
$ rbd du img
NAME PROVISIONED USED
img        1 GiB 956 MiB
$ ceph df | grep -B1 rbd
POOL ID STORED OBJECTS USED   %USED MAX AVAIL
rbd   1 36 MiB     248 36 MiB     0  1013 GiB

You can see that df -h is back to ~6M, but "rbd du" USED remained
the same.  This is because "rbd du" is very coarse-grained, it works
at the object level and doesn't go any deeper.  If the number of
objects and their sizes remain the same, "rbd du" USED remains the
same.  It doesn't account for sparseness which I produced above.

"ceph df" goes down to the individual bluestore blobs, but only per
pool.  Looking at STORED, you can see that the space is back, even
though the number of objects remained the same.  Unfortunately, there
is no (fast) way to get the same information per image.

So what you see in the dashboard is basically "rbd du".  It is fast
to compute (especially when object map is enabled), but it shows you
the picture at the object level, not at the blob level.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com