ceph df reporting incorrect used space after pg reduction

David Alfano <dalfano@xxxxxxxxx> · Thu, 26 May 2022 14:47:20 -0500

Howdy Ceph-Users!

Over the past few days, I've noticed an interesting behavior in 15.2.15
that I'm curious if anyone else can reproduce. After setting up a few pools
and running some load against them, I lowered the number of pgs in the
TestA pool from 4096 to 1024. To track the process of the merging pgs, I
stuck a watch on `ceph df` and let it run. No i/o was happening on the
cluster during the decrease of pgs. After a few hours, I came back to see
that the process had completed, but now `ceph df` was reporting far
different usage values than what I had started with.

============================================================================
Thu May 26 16:39:55 UTC 2022
TestA                  10  1412   30 GiB    1.59k   37 GiB      0     20 PiB
TestB                  11   256  537 KiB        1  1.6 MiB      0    8.0 PiB
test1                  12    32   58 GiB    3.38k  174 GiB      0    8.0 PiB
test2                  13    64  916 KiB        5  3.2 MiB      0    8.0 PiB
============================================================================
============================================================================
Thu May 26 16:40:05 UTC 2022
TestA                  10  1409   30 GiB    1.59k   37 GiB      0     20 PiB
TestB                  11   256  537 KiB        1  1.6 MiB      0    8.0 PiB
test1                  12    32   58 GiB    3.38k  174 GiB      0    8.0 PiB
test2                  13    64  916 KiB        5  3.2 MiB      0    8.0 PiB
============================================================================
============================================================================
Thu May 26 16:40:16 UTC 2022
TestA                  10  1407   30 GiB    1.59k   30 GiB      0     20 PiB
TestB                  11   256      0 B        1      0 B      0    8.0 PiB
test1                  12    32   58 GiB    3.38k   58 GiB      0    8.0 PiB
test2                  13    64  3.8 KiB        5  3.8 KiB      0    8.0 PiB
============================================================================
snippet from a ceph df taken during the merge process

Pool info

TestA is a 10/2 EC pool
TestB is a 3x replicated pool for metadata
test1 is a 3x replicated pool for data
test2 is a 3x replicated pool for metadata

TestA Erasure-code-profile

root@Pikachu:~# ceph osd erasure-code-profile get TestA_ec_profile
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=10
m=2
plugin=jerasure
technique=reed_sol_van
w=8

root@Pikachu:~# ceph versions
{
    "mon": {
        "ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985)
octopus (stable)": 3
    },
    "mgr": {
        "ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985)
octopus (stable)": 3
    },
    "osd": {
        "ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985)
octopus (stable)": 2267
    },
    "mds": {},
    "rgw": {
        "ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985)
octopus (stable)": 63
    },
    "overall": {
        "ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985)
octopus (stable)": 2336
    }
}

I've confirmed that the objects and their copies still exist within the
cluster, which makes me believe this is purely a reporting issue. If I had
to guess, somehow the variable for USED space is being set to the variable
used for STORED data. I've been able to reproduce the behavior consistently
with the following process:

 - Create several pools, with one being EC 10/2

 - Set the EC 10/2 pool pg_num and pgp_num 4096 pgs

 - Put data into all pools

 - Lower the EC 10/2 pool's pg_num and pgp_num to 1024

 - Around when the EC 10/2 pool has around 1400 pgs, ceph df will report
differently

As a workaround: to get `ceph df` to report the correct information, all
that is needed is to increase the pg_num and pgp_num of any of the three
other pools.

Has anyone else noticed this behavior? Should I file a bug report or is
this already known?

Respectfully,
David
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx