Re: Cannot recreate monitor in upgrade from pacific to quincy (leveldb -> rocksdb)

"Mark Schouten" <mark@xxxxxxxx> · Fri, 02 Feb 2024 08:21:02 +0000

Hi,

Cool, thanks!

As for the global_id_reclaim settings:
root@proxmox01:~# ceph config get mon 
auth_allow_insecure_global_id_reclaim
false
root@proxmox01:~# ceph config get mon 
auth_expose_insecure_global_id_reclaim
true
root@proxmox01:~# ceph config get mon 
mon_warn_on_insecure_global_id_reclaim
true
root@proxmox01:~# ceph config get mon 
mon_warn_on_insecure_global_id_reclaim_allowed
true

—
Mark Schouten
CTO, Tuxis B.V.
+31 318 200208 / mark@xxxxxxxx

------ Original Message ------
From "Eugen Block" <eblock@xxxxxx>
To ceph-users@xxxxxxx
Date 02/02/2024, 08:30:45
Subject  Re: Cannot recreate monitor in upgrade from pacific 
to quincy (leveldb -> rocksdb)

I might have a reproducer, the second rebuilt mon is not joining the  cluster as well, I'll look into it and let you know if I find anything.

Zitat von Eugen Block <eblock@xxxxxx>:

Hi,

Can anyone confirm that ancient (2017) leveldb database mons should  just accept ‘mon.$hostname’ names for mons, a well as ‘mon.$id’ ?

at some point you had or have to remove one of the mons to recreate  it with a rocksdb backend, so the mismatch should not be an issue  here. I can confirm that when I tried to reproduce it in a small  test cluster with leveldb. So now I have two leveldb MONs and one  rocksdb MON:

jewel:~ # cat  /var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel/kv_backend
rocksdb
jewel2:~ # cat  /var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel2/kv_backend
leveldb
jewel3:~ # cat  /var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel3/kv_backend
leveldb

And the cluster is healthy, although it took a minute or two for the  rebuilt MON to sync (in a real cluster with some load etc. it might  take longer):

jewel:~ # ceph -s
  cluster:
    id:     b08424fa-8530-4080-876d-2821c916d26c
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum jewel2,jewel3,jewel (age 3m)

I'm wondering if this could have to do with the insecure_global_id  things. Can you send the output of:

ceph config get mon auth_allow_insecure_global_id_reclaim
ceph config get mon auth_expose_insecure_global_id_reclaim
ceph config get mon mon_warn_on_insecure_global_id_reclaim
ceph config get mon mon_warn_on_insecure_global_id_reclaim_allowed

Zitat von Mark Schouten <mark@xxxxxxxx>:

Hi,

I don’t have a fourth machine available, so that’s not an option  unfortunatly.

I did enable a lot of debugging earlier, but that shows no  information as to why stuff is not working as to be expected.

Proxmox just deploys the mons, nothing fancy there, no special cases.

Can anyone confirm that ancient (2017) leveldb database mons should  just accept ‘mon.$hostname’ names for mons, a well as ‘mon.$id’ ?

—
Mark Schouten
CTO, Tuxis B.V.
+31 318 200208 / mark@xxxxxxxx

------ Original Message ------
From "Eugen Block" <eblock@xxxxxx>
To ceph-users@xxxxxxx
Date 31/01/2024, 13:02:04
Subject  Re: Cannot recreate monitor in upgrade from  pacific to quincy (leveldb -> rocksdb)

Hi Mark,

as I'm not familiar with proxmox I'm not sure what happens under  the  hood. There are a couple of things I would try, not  necessarily in  this order:

- Check the troubleshooting guide [1], for example a clock skew  could  be one reason, have you verified ntp/chronyd functionality?
- Inspect debug log output, maybe first on the probing mon and if   those don't reveal the reason, enable debug logs for the other  MONs as  well:
ceph config set mon.proxmox03 debug_mon 20
ceph config set mon.proxmox03 debug_paxos 20

or for all MONs:
ceph config set mon debug_mon 20
ceph config set mon debug_paxos 20

- Try to deploy an additional MON on a different server (if you  have  more available) and see if that works.
- Does proxmox log anything?
- Maybe last resort, try to start a MON manually after adding it  to  the monmap with the monmaptool, but only if you know what  you're  doing. I wonder if the monmap doesn't get updated...

Regards,
Eugen

[1]  https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/

Zitat von Mark Schouten <mark@xxxxxxxx>:

Hi,

During an upgrade from pacific to quincy, we needed to recreate  the  mons because the mons were pretty old and still using leveldb.

So step one was to destroy one of the mons. After that we  recreated  the monitor, and although it starts, it remains in  state ‘probing’,  as you can see below.

No matter what I tried, it won’t come up. I’ve seen quite some   messages that the MTU might be an issue, but that seems to be ok:
root@proxmox03:/var/log/ceph# fping -b 1472 10.10.10.{1..3} -M
10.10.10.1 is alive
10.10.10.2 is alive
10.10.10.3 is alive

Does anyone have an idea how to fix this? I’ve tried destroying  and  recreating the mon a few times now. Could it be that the  leveldb  mons only support mon.$id notation for the monitors?

root@proxmox03:/var/log/ceph# ceph daemon mon.proxmox03 mon_status
{
  "name": “proxmox03”,
  "rank": 2,
  "state": “probing”,
  "election_epoch": 0,
  "quorum": [],
  "features": {
      "required_con": “2449958197560098820”,
      "required_mon": [
          “kraken”,
          “luminous”,
          “mimic”,
          "osdmap-prune”,
          “nautilus”,
          “octopus”,
          “pacific”,
          "elector-pinging”
      ],
      "quorum_con": “0”,
      "quorum_mon": []
  },
  "outside_quorum": [
      “proxmox03”
  ],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": {
      "epoch": 0,
      "fsid": "39b1e85c-7b47-4262-9f0a-47ae91042bac”,
      "modified": "2024-01-23T21:02:12.631320Z”,
      "created": "2017-03-15T14:54:55.743017Z”,
      "min_mon_release": 16,
      "min_mon_release_name": “pacific”,
      "election_strategy": 1,
      "disallowed_leaders: ": “”,
      "stretch_mode": false,
      "tiebreaker_mon": “”,
      "removed_ranks: ": “2”,
      "features": {
          "persistent": [
              “kraken”,
              “luminous”,
              “mimic”,
              "osdmap-prune”,
              “nautilus”,
              “octopus”,
              “pacific”,
              "elector-pinging”
          ],
          "optional": []
      },
      "mons": [
          {
              "rank": 0,
              "name": “0”,
              "public_addrs": {
                  "addrvec": [
                      {
                          "type": “v2”,
                          "addr": "10.10.10.1:3300”,
                          "nonce": 0
                      },
                      {
                          "type": “v1”,
                          "addr": "10.10.10.1:6789”,
                          "nonce": 0
                      }
                  ]
              },
              "addr": "10.10.10.1:6789/0”,
              "public_addr": "10.10.10.1:6789/0”,
              "priority": 0,
              "weight": 0,
              "crush_location": “{}”
          },
          {
              "rank": 1,
              "name": “1”,
              "public_addrs": {
                  "addrvec": [
                      {
                          "type": “v2”,
                          "addr": "10.10.10.2:3300”,
                          "nonce": 0
                      },
                      {
                          "type": “v1”,
                          "addr": "10.10.10.2:6789”,
                          "nonce": 0
                      }
                  ]
              },
              "addr": "10.10.10.2:6789/0”,
              "public_addr": "10.10.10.2:6789/0”,
              "priority": 0,
              "weight": 0,
              "crush_location": “{}”
          },
          {
              "rank": 2,
              "name": “proxmox03”,
              "public_addrs": {
                  "addrvec": [
                      {
                          "type": “v2”,
                          "addr": "10.10.10.3:3300”,
                          "nonce": 0
                      },
                      {
                          "type": “v1”,
                          "addr": "10.10.10.3:6789”,
                          "nonce": 0
                      }
                  ]
              },
              "addr": "10.10.10.3:6789/0”,
              "public_addr": "10.10.10.3:6789/0”,
              "priority": 0,
              "weight": 0,
              "crush_location": “{}”
          }
      ]
  },
  "feature_map": {
      "mon": [
          {
              "features": “0x3f01cfbdfffdffff”,
              "release": “luminous”,
              "num": 1
          }
      ]
  },
  "stretch_mode": false
}

—
Mark Schouten
CTO, Tuxis B.V.
+31 318 200208 / mark@xxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx