Re: steady increasing of osd map epoch since octopus

Manuel Lausch <manuel.lausch@xxxxxxxx> · Mon, 8 Nov 2021 14:37:18 +0100

Hi Dan,

thanks for the hint.
The cluster is not doing any changes (rebalance, merging, splitting, or
somethin like this). Only normal client traffic via librados.

In the mon.log I see regularly the following messages, which seems to
corelate to the osd map "changes"

2021-11-08T14:15:58.915+0100 7f8bd32a3700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd32a3700' had timed out after 0.000000000s
2021-11-08T14:15:58.953+0100 7f8bd3aa4700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd3aa4700' had timed out after 0.000000000s
2021-11-08T14:15:59.201+0100 7f8bd2aa2700  1 mon.csdeveubs-u02c01mon03@2(peon).osd e1970041 e1970041: 125 total, 125 up, 125 in
2021-11-08T14:15:59.242+0100 7f8bd4aa6700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd4aa6700' had timed out after 0.000000000s
2021-11-08T14:15:59.480+0100 7f8bd2aa2700  1 mon.csdeveubs-u02c01mon03@2(peon).osd e1970042 e1970042: 125 total, 125 up, 125 in
2021-11-08T14:15:59.484+0100 7f8bd32a3700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd32a3700' had timed out after 0.000000000s
2021-11-08T14:15:59.520+0100 7f8bd42a5700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd42a5700' had timed out after 0.000000000s
2021-11-08T14:15:59.757+0100 7f8bd2aa2700  1 mon.csdeveubs-u02c01mon03@2(peon).osd e1970043 e1970043: 125 total, 125 up, 125 in
2021-11-08T14:15:59.797+0100 7f8bd3aa4700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd3aa4700' had timed out after 0.000000000s
2021-11-08T14:16:00.047+0100 7f8bd2aa2700  1 mon.csdeveubs-u02c01mon03@2(peon).osd e1970044 e1970044: 125 total, 125 up, 125 in
2021-11-08T14:16:00.051+0100 7f8bd4aa6700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd4aa6700' had timed out after 0.000000000s
2021-11-08T14:16:00.087+0100 7f8bd32a3700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd32a3700' had timed out after 0.000000000s
2021-11-08T14:16:00.329+0100 7f8bd2aa2700  1 mon.csdeveubs-u02c01mon03@2(peon).osd e1970045 e1970045: 125 total, 125 up, 125 in
2021-11-08T14:16:00.369+0100 7f8bd4aa6700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd4aa6700' had timed out after 0.000000000s
2021-11-08T14:16:00.635+0100 7f8bd2aa2700  1 mon.csdeveubs-u02c01mon03@2(peon).osd e1970046 e1970046: 125 total, 125 up, 125 in
2021-11-08T14:16:00.640+0100 7f8bd32a3700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd32a3700' had timed out after 0.000000000s
2021-11-08T14:16:00.674+0100 7f8bd3aa4700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd3aa4700' had timed out after 0.000000000s
2021-11-08T14:16:00.930+0100 7f8bd2aa2700  1 mon.csdeveubs-u02c01mon03@2(peon).osd e1970047 e1970047: 125 total, 125 up, 125 in
2021-11-08T14:16:00.968+0100 7f8bd32a3700  1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f8bd32a3700' had timed out after 0.000000000s

timeouts after 0.0 seconds?
In between this timeouts the osdmap epoch is increasing. This happens
in bursts. Between this bursts there is no new map epoch.

Manuel

On Mon, 8 Nov 2021 13:01:06 +0100
Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:

> Hi,
> 
> Okay. Here is another case which was churning the osdmaps:
> https://tracker.ceph.com/issues/51433
> Perhaps similar debugging will show what's creating the maps in your
> case.
> 
> Cheers, Dan
> 
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx