Howdy Ceph-Users! Over the past few days, I've noticed an interesting behavior in 15.2.15 that I'm curious if anyone else can reproduce. After setting up a few pools and running some load against them, I lowered the number of pgs in the TestA pool from 4096 to 1024. To track the process of the merging pgs, I stuck a watch on `ceph df` and let it run. No i/o was happening on the cluster during the decrease of pgs. After a few hours, I came back to see that the process had completed, but now `ceph df` was reporting far different usage values than what I had started with. ============================================================================ Thu May 26 16:39:55 UTC 2022 TestA 10 1412 30 GiB 1.59k 37 GiB 0 20 PiB TestB 11 256 537 KiB 1 1.6 MiB 0 8.0 PiB test1 12 32 58 GiB 3.38k 174 GiB 0 8.0 PiB test2 13 64 916 KiB 5 3.2 MiB 0 8.0 PiB ============================================================================ ============================================================================ Thu May 26 16:40:05 UTC 2022 TestA 10 1409 30 GiB 1.59k 37 GiB 0 20 PiB TestB 11 256 537 KiB 1 1.6 MiB 0 8.0 PiB test1 12 32 58 GiB 3.38k 174 GiB 0 8.0 PiB test2 13 64 916 KiB 5 3.2 MiB 0 8.0 PiB ============================================================================ ============================================================================ Thu May 26 16:40:16 UTC 2022 TestA 10 1407 30 GiB 1.59k 30 GiB 0 20 PiB TestB 11 256 0 B 1 0 B 0 8.0 PiB test1 12 32 58 GiB 3.38k 58 GiB 0 8.0 PiB test2 13 64 3.8 KiB 5 3.8 KiB 0 8.0 PiB ============================================================================ snippet from a ceph df taken during the merge process Pool info TestA is a 10/2 EC pool TestB is a 3x replicated pool for metadata test1 is a 3x replicated pool for data test2 is a 3x replicated pool for metadata TestA Erasure-code-profile root@Pikachu:~# ceph osd erasure-code-profile get TestA_ec_profile crush-device-class=hdd crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=10 m=2 plugin=jerasure technique=reed_sol_van w=8 root@Pikachu:~# ceph versions { "mon": { "ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)": 3 }, "mgr": { "ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)": 3 }, "osd": { "ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)": 2267 }, "mds": {}, "rgw": { "ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)": 63 }, "overall": { "ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable)": 2336 } } I've confirmed that the objects and their copies still exist within the cluster, which makes me believe this is purely a reporting issue. If I had to guess, somehow the variable for USED space is being set to the variable used for STORED data. I've been able to reproduce the behavior consistently with the following process: - Create several pools, with one being EC 10/2 - Set the EC 10/2 pool pg_num and pgp_num 4096 pgs - Put data into all pools - Lower the EC 10/2 pool's pg_num and pgp_num to 1024 - Around when the EC 10/2 pool has around 1400 pgs, ceph df will report differently As a workaround: to get `ceph df` to report the correct information, all that is needed is to increase the pg_num and pgp_num of any of the three other pools. Has anyone else noticed this behavior? Should I file a bug report or is this already known? Respectfully, David _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx