Oh yes, sounds like purging the rbd trash will be the real fix here! Good luck! ______________________________________________________ Clyso GmbH | Ceph Support and Consulting | https://www.clyso.com On Mon, Jul 10, 2023 at 6:10 AM Eugen Block <eblock@xxxxxx> wrote: > Hi, > I got a customer response with payload size 4096, that made things > even worse. The mon startup time was now around 40 minutes. My doubts > wrt decreasing the payload size seem confirmed. Then I read Dan's > response again which also mentions that the default payload size could > be too small. So I asked them to double the default (2M instead of 1M) > and am now waiting for a new result. I'm still wondering why this only > happens when the mon is down for more than 5 minutes. Does anyone have > an explanation for that time factor? > Another thing they're going to do is to remove lots of snapshot > tombstones (rbd mirroring snapshots in the trash namespace), maybe > that will reduce the osd_snap keys in the mon db, which then would > increase the startup time. We'll see... > > Zitat von Eugen Block <eblock@xxxxxx>: > > > Thanks, Dan! > > > >> Yes that sounds familiar from the luminous and mimic days. > >> The workaround for zillions of snapshot keys at that time was to use: > >> ceph config set mon mon_sync_max_payload_size 4096 > > > > I actually did search for mon_sync_max_payload_keys, not bytes so I > > missed your thread, it seems. Thanks for pointing that out. So the > > defaults seem to be these in Octopus: > > > > "mon_sync_max_payload_keys": "2000", > > "mon_sync_max_payload_size": "1048576", > > > >> So it could be in your case that the sync payload is just too small to > >> efficiently move 42 million osd_snap keys? Using debug_paxos and > debug_mon > >> you should be able to understand what is taking so long, and tune > >> mon_sync_max_payload_size and mon_sync_max_payload_keys accordingly. > > > > I'm confused, if the payload size is too small, why would decreasing > > it help? Or am I misunderstanding something? But it probably won't > > hurt to try it with 4096 and see if anything changes. If not we can > > still turn on debug logs and take a closer look. > > > >> And additional to Dan suggestion, the HDD is not a good choices for > >> RocksDB, which is most likely the reason for this thread, I think > >> that from the 3rd time the database just goes into compaction > >> maintenance > > > > Believe me, I know... but there's not much they can currently do > > about it, quite a long story... But I have been telling them that > > for months now. Anyway, I will make some suggestions and report back > > if it worked in this case as well. > > > > Thanks! > > Eugen > > > > Zitat von Dan van der Ster <dan.vanderster@xxxxxxxxx>: > > > >> Hi Eugen! > >> > >> Yes that sounds familiar from the luminous and mimic days. > >> > >> Check this old thread: > >> > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/F3W2HXMYNF52E7LPIQEJFUTAD3I7QE25/ > >> (that thread is truncated but I can tell you that it worked for Frank). > >> Also the even older referenced thread: > >> > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/M5ZKF7PTEO2OGDDY5L74EV4QS5SDCZTH/ > >> > >> The workaround for zillions of snapshot keys at that time was to use: > >> ceph config set mon mon_sync_max_payload_size 4096 > >> > >> That said, that sync issue was supposed to be fixed by way of adding the > >> new option mon_sync_max_payload_keys, which has been around since > nautilus. > >> > >> So it could be in your case that the sync payload is just too small to > >> efficiently move 42 million osd_snap keys? Using debug_paxos and > debug_mon > >> you should be able to understand what is taking so long, and tune > >> mon_sync_max_payload_size and mon_sync_max_payload_keys accordingly. > >> > >> Good luck! > >> > >> Dan > >> > >> ______________________________________________________ > >> Clyso GmbH | Ceph Support and Consulting | https://www.clyso.com > >> > >> > >> > >> On Thu, Jul 6, 2023 at 1:47 PM Eugen Block <eblock@xxxxxx> wrote: > >> > >>> Hi *, > >>> > >>> I'm investigating an interesting issue on two customer clusters (used > >>> for mirroring) I've not solved yet, but today we finally made some > >>> progress. Maybe someone has an idea where to look next, I'd appreciate > >>> any hints or comments. > >>> These are two (latest) Octopus clusters, main usage currently is RBD > >>> mirroring with snapshot mode (around 500 RBD images are synced every > >>> 30 minutes). They noticed very long startup times of MON daemons after > >>> reboot, times between 10 and 30 minutes (reboot time already > >>> subtracted). These delays are present on both sites. Today we got a > >>> maintenance window and started to check in more detail by just > >>> restarting the MON service (joins quorum within seconds), then > >>> stopping the MON service and wait a few minutes (still joins quorum > >>> within seconds). And then we stopped the service and waited for more > >>> than 5 minutes, simulating a reboot, and then we were able to > >>> reproduce it. The sync then takes around 15 minutes, we verified with > >>> other MONs as well. The MON store is around 2 GB of size (on HDD), I > >>> understand that the sync itself can take some time, but what is the > >>> threshold here? I tried to find a hint in the MON config, searching > >>> for timeouts with 300 seconds, there were only a few matches > >>> (mon_session_timeout is one of them), but I'm not sure if they can > >>> explain this behavior. > >>> Investigating the MON store (ceph-monstore-tool dump-keys) I noticed > >>> that there were more than 42 Million osd_snap keys, which is quite a > >>> lot and would explain the size of the MON store. But I'm also not sure > >>> if it's related to the long syncing process. > >>> Does that sound familiar to anyone? > >>> > >>> Thanks, > >>> Eugen > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>> > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx