Re: device_health_metrics pool automatically recreated

Eugen Block <eblock@xxxxxx> · Sun, 29 Sep 2024 20:47:17 +0000

So I was able to reproduce it. I created a Octopus cluster, created a  
couple of OSDs, the device_health_metrics pool was automatically  
created as expected:

2024-09-29T20:14:21.225+0000 7ff3b6c8e700  0 mon.soc9-ceph@0(leader)  
e1 handle_command mon_command({"prefix": "osd pool rename", "format":  
"json", "srcpool": "device_health_metrics", "destpool": ".mgr"} v 0) v1
2024-09-29T20:14:22.233+0000 7ff3b548b700  0 log_channel(audit) log  
[INF] : from='mgr.14302 192.168.124.186:0/3683659721'  
entity='mgr.soc9-ceph.vgsrao' cmd='[{"prefix": "osd pool rename",  
"format": "json", "srcpool": "device_health_metrics", "destpool":  
".mgr"}]': finished

After creating a test pool, I upgraded the cluster to Quincy. And then  
the device_health_metrics pool is recreated:

2024-09-29T20:24:30.829+0000 7f9289bf4700  0 mon.soc9-ceph@0(leader)  
e1 handle_command mon_command({"prefix": "osd pool create", "format":  
"json", "pool": "device_health_metrics", "pg_num": 1, "pg_num_min": 1}  
v 0) v1

This was after the first MGR had been upgraded and failed over to the  
old one. So your assumption seems to be correct. I haven't checked  
other upgrade paths, so this probably isn't a big deal. But perhaps a  
note in the docs could mention that there might be a new pool  
after/during the upgrade?

Thanks,
Eugen

Zitat von Eugen Block <eblock@xxxxxx>:

Thanks for chiming in, Patrick.

Altough I can't rule it out, I doubt that anyone except me was on  
the cluster after we performed the upgrade. It had a very low  
priority for the customer. Do you think that if I deleted the  
device_health_metrics pool and started a legacy mgr, it would  
recreate the pool? I think I should be able to try that, just to  
confirm.

Zitat von Patrick Donnelly <pdonnell@xxxxxxxxxx>:

On Tue, Aug 27, 2024 at 6:49 AM Eugen Block <eblock@xxxxxx> wrote:

Hi,

I just looked into one customer cluster that we upgraded some time ago
from Octopus to Quincy (17.2.6) and I'm wondering why there are still
both pools, "device_health_metrics" and ".mgr".

According to the docs [0], it's supposed to be renamed:

Prior to Quincy, the devicehealth module created a
device_health_metrics pool to store device SMART statistics. With
Quincy, this pool is automatically renamed to be the common manager
module pool.

Now only .mgr has data while device_health_metrics is empty, but it
has a newer ID:

ses01:~ # ceph df | grep -E "device_health|.mgr"
.mgr                            1     1   68 MiB       18  204 MiB
 0    254 TiB
device_health_metrics          15     1      0 B        0      0 B
 0    254 TiB

On a test cluster (meanwhile upgraded to latest Reef) I see the same:

ceph01:~ # ceph df | grep -E "device_health_metrics|.mgr"
.mgr                        38    1  577 KiB        2  1.7 MiB      0
   71 GiB
device_health_metrics       45    1      0 B        0      0 B      0
   71 GiB

Since there are still many users who haven't upgraded to >= Quincy
yet, this should be clarified/fixed. I briefly checked
tracker.ceph.com, but didn't find anything related to this. I'm
currently trying to reproduce it on a one-node test cluster which I
upgraded from Pacific to Quincy, but no results yet, only that the
renaming was successful. But for the other clusters I don't have
enough logs to find out how/why the device_health_metrics pool had
been recreated.

Probably someone ran a pre-Quincy ceph-mgr on the cluster after the
upgrade? That would explain the larger pool id.

--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx