Re: MON sync time depends on outage duration

Eugen Block <eblock@xxxxxx> · Fri, 07 Jul 2023 07:40:55 +0000

Thanks, Dan!

Yes that sounds familiar from the luminous and mimic days.
The workaround for zillions of snapshot keys at that time was to use:
   ceph config set mon mon_sync_max_payload_size 4096

I actually did search for mon_sync_max_payload_keys, not bytes so I  
missed your thread, it seems. Thanks for pointing that out. So the  
defaults seem to be these in Octopus:

    "mon_sync_max_payload_keys": "2000",
    "mon_sync_max_payload_size": "1048576",

So it could be in your case that the sync payload is just too small to
efficiently move 42 million osd_snap keys? Using debug_paxos and debug_mon
you should be able to understand what is taking so long, and tune
mon_sync_max_payload_size and mon_sync_max_payload_keys accordingly.

I'm confused, if the payload size is too small, why would decreasing  
it help? Or am I misunderstanding something? But it probably won't  
hurt to try it with 4096 and see if anything changes. If not we can  
still turn on debug logs and take a closer look.

And additional to Dan suggestion, the HDD is not a good choices for  
RocksDB, which is most likely the reason for this thread, I think  
that from the 3rd time the database just goes into compaction  
maintenance

Believe me, I know... but there's not much they can currently do about  
it, quite a long story... But I have been telling them that for months  
now. Anyway, I will make some suggestions and report back if it worked  
in this case as well.

Thanks!
Eugen

Zitat von Dan van der Ster <dan.vanderster@xxxxxxxxx>:

Hi Eugen!

Yes that sounds familiar from the luminous and mimic days.

Check this old thread:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/F3W2HXMYNF52E7LPIQEJFUTAD3I7QE25/
(that thread is truncated but I can tell you that it worked for Frank).
Also the even older referenced thread:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/M5ZKF7PTEO2OGDDY5L74EV4QS5SDCZTH/

The workaround for zillions of snapshot keys at that time was to use:
   ceph config set mon mon_sync_max_payload_size 4096

That said, that sync issue was supposed to be fixed by way of adding the
new option mon_sync_max_payload_keys, which has been around since nautilus.

So it could be in your case that the sync payload is just too small to
efficiently move 42 million osd_snap keys? Using debug_paxos and debug_mon
you should be able to understand what is taking so long, and tune
mon_sync_max_payload_size and mon_sync_max_payload_keys accordingly.

Good luck!

Dan

______________________________________________________
Clyso GmbH | Ceph Support and Consulting | https://www.clyso.com

On Thu, Jul 6, 2023 at 1:47 PM Eugen Block <eblock@xxxxxx> wrote:

Hi *,

I'm investigating an interesting issue on two customer clusters (used
for mirroring) I've not solved yet, but today we finally made some
progress. Maybe someone has an idea where to look next, I'd appreciate
any hints or comments.
These are two (latest) Octopus clusters, main usage currently is RBD
mirroring with snapshot mode (around 500 RBD images are synced every
30 minutes). They noticed very long startup times of MON daemons after
reboot, times between 10 and 30 minutes (reboot time already
subtracted). These delays are present on both sites. Today we got a
maintenance window and started to check in more detail by just
restarting the MON service (joins quorum within seconds), then
stopping the MON service and wait a few minutes (still joins quorum
within seconds). And then we stopped the service and waited for more
than 5 minutes, simulating a reboot, and then we were able to
reproduce it. The sync then takes around 15 minutes, we verified with
other MONs as well. The MON store is around 2 GB of size (on HDD), I
understand that the sync itself can take some time, but what is the
threshold here? I tried to find a hint in the MON config, searching
for timeouts with 300 seconds, there were only a few matches
(mon_session_timeout is one of them), but I'm not sure if they can
explain this behavior.
Investigating the MON store (ceph-monstore-tool dump-keys) I noticed
that there were more than 42 Million osd_snap keys, which is quite a
lot and would explain the size of the MON store. But I'm also not sure
if it's related to the long syncing process.
Does that sound familiar to anyone?

Thanks,
Eugen
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx