Hi Eugen. > how much changes do you see? About 1 new map every 5-10 seconds. The time interval varies. > but I would first try to find out what exactly is causing them That's what I'm trying to do. However, I'm out of ideas what to look for. I followed all the cases I could find to no avail. The leader just says that there is a new map, nothing else: 2024-11-18T15:57:37.147396+0100 mon.ceph-01 (mon.0) 3081080 : cluster [DBG] osdmap e3075180: 1317 total, 1308 up, 1308 in The MGRs are equally shy: 2024-11-18T15:51:18.159+0100 7fdfc9437700 0 [progress INFO root] Processing OSDMap change 3075139..3075140 I have no clue what else to look for and where. The cluster is re-balancing at the moment, so there are actual osdmap changes along the way. However, the vast majority of map changes has empty diff and must have a different reason. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Eugen Block <eblock@xxxxxx> Sent: Monday, November 18, 2024 4:38 PM To: ceph-users@xxxxxxx Subject: Re: constant increase in osdmap epoch Hi Frank, do you use snapshots a lot? Purging snaps would create a new osdmap as well. Have you checked debug logs of the mon leader to see what triggers the osdmap change? I see in our moderately used Pacific cluster "only" around 60 osdmap changes per day (I haven't looked too deep yet), in a much busier customer cluster I see around 200 epoch changes per day, how much changes do you see? I guess you could play with the pruning configs (many of the defaults don't necessarily fit a production load), but I would first try to find out what exactly is causing them. Regards, Eugen Zitat von Frank Schilder <frans@xxxxxx>: > Hi all, > > we observe a problem that has been reported before, but I can't find > a resolution. This is related to an earlier thread "failed to load > OSD map for epoch 2898146, got 0 bytes" > (https://www.spinics.net/lists/ceph-users/msg84485.html). > > We run an octopus latest cluster and observe a constant increase in > osdmap epoch every few seconds. There is no change in the contents > between two successive epochs: > > # diff map.3075085.txt map.3075086.txt > 1c1 > < epoch 3075085 > --- >> epoch 3075086 > 4c4 > < modified 2024-11-18T15:38:45.512100+0100 > --- >> modified 2024-11-18T15:38:47.858092+0100 > > This is exactly what others reported too, for example, "steady > increasing of osd map epoch since octopus" > (https://www.spinics.net/lists/ceph-users/msg69443.html). Its a real > problem since it dramatically shortens the time window an OSD can be > down before its latest OSD map is purged from the cluster. This, in > turn, leads to serious follow-up problems with OSD restart as > reported in the thread I'm referring to at the beginning. > > Related to that I also see the mgrs increasing the pgmap version > constantly every 2 seconds. However, I believe this is intentional. > > I don't see this redundant pgp_num_actual setting by the mgrs > reported here: https://tracker.ceph.com/issues/51433 . > > I can't find a resolution anywhere. Any help would be very much appreciated. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx