On Mon, Nov 8, 2021 at 6:03 AM Manuel Lausch <manuel.lausch@xxxxxxxx> wrote: > Okay. > The default vaule for paxos_propose_interval seems to be "1.0" not > "2.0". But anyway, reducing to 0.25 seems to fix this issue on our > testing cluster. > > I wanted to test some failure scenarios with this value and had a look > to the osdmap epoch to check how many new maps will be created. > On the corresponding graph I did see, that since the update to octopus > (and in nautilus too) the epoch is continuously increasing (see my > other mail). The diff between two maps is empty, expect of the epoch > and creation date. > That is concerning. Can you set debug_mon = 20 and capture a minute or so of logs? (Enough to include a few osdmap epochs.) You can use ceph-post-file to send it to us. Thanks! sage > > > Manuel > > > On Fri, 5 Nov 2021 18:33:58 -0500 > Sage Weil <sage@xxxxxxxxxxxx> wrote: > > > Yeah, I think two different things are going on here. > > > > The read leases were new, and I think the way that OSDs are marked > > down is the key things that affects that behavior. I'm a bit > > surprised that the _notify_mon option helps there, and will take a > > closer look at that Monday to make sure it's doing what it's supposed > > to be doing. > > > > The paxos_propose_interval is an upper bound on how long the monitor > > is allowed to batch updates before committing them. Many/most > > changes are committed immediately, but the osdmap management tries to > > batch things up so that a single osdmap epoch combines lots of > > changes when they are happening quickly (there tends to be mini > > storms up dates when cluster changes happen). The default of 2s > > might be too much for many environments, though... and we might > > consider changing the default to something smaller (maybe more like > > 250ms). > > > > sage > > > > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx