Re: constant increase in osdmap epoch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eugen.

> how much changes do you see?
About 1 new map every 5-10 seconds. The time interval varies.

> but I would first try to find out what exactly is causing them
That's what I'm trying to do. However, I'm out of ideas what to look for. I followed all the cases I could find to no avail. The leader just says that there is a new map, nothing else:

2024-11-18T15:57:37.147396+0100 mon.ceph-01 (mon.0) 3081080 : cluster [DBG] osdmap e3075180: 1317 total, 1308 up, 1308 in

The MGRs are equally shy:

2024-11-18T15:51:18.159+0100 7fdfc9437700  0 [progress INFO root] Processing OSDMap change 3075139..3075140

I have no clue what else to look for and where.

The cluster is re-balancing at the moment, so there are actual osdmap changes along the way. However, the vast majority of map changes has empty diff and must have a different reason.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: Monday, November 18, 2024 4:38 PM
To: ceph-users@xxxxxxx
Subject:  Re: constant increase in osdmap epoch

Hi Frank,

do you use snapshots a lot? Purging snaps would create a new osdmap as
well. Have you checked debug logs of the mon leader to see what
triggers the osdmap change? I see in our moderately used Pacific
cluster "only" around 60 osdmap changes per day (I haven't looked too
deep yet), in a much busier customer cluster I see around 200 epoch
changes per day, how much changes do you see? I guess you could play
with the pruning configs (many of the defaults don't necessarily fit a
production load), but I would first try to find out what exactly is
causing them.

Regards,
Eugen

Zitat von Frank Schilder <frans@xxxxxx>:

> Hi all,
>
> we observe a problem that has been reported before, but I can't find
> a resolution. This is related to an earlier thread "failed to load
> OSD map for epoch 2898146, got 0 bytes"
> (https://www.spinics.net/lists/ceph-users/msg84485.html).
>
> We run an octopus latest cluster and observe a constant increase in
> osdmap epoch every few seconds. There is no change in the contents
> between two successive epochs:
>
> # diff map.3075085.txt map.3075086.txt
> 1c1
> < epoch 3075085
> ---
>> epoch 3075086
> 4c4
> < modified 2024-11-18T15:38:45.512100+0100
> ---
>> modified 2024-11-18T15:38:47.858092+0100
>
> This is exactly what others reported too, for example, "steady
> increasing of osd map epoch since octopus"
> (https://www.spinics.net/lists/ceph-users/msg69443.html). Its a real
> problem since it dramatically shortens the time window an OSD can be
> down before its latest OSD map is purged from the cluster. This, in
> turn, leads to serious follow-up problems with OSD restart as
> reported in the thread I'm referring to at the beginning.
>
> Related to that I also see the mgrs increasing the pgmap version
> constantly every 2 seconds. However, I believe this is intentional.
>
> I don't see this redundant pgp_num_actual setting by the mgrs
> reported here: https://tracker.ceph.com/issues/51433 .
>
> I can't find a resolution anywhere. Any help would be very much appreciated.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux