Re: MON slow ops and growing MON store

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

> Like last time, after I restarted all five MONs, the store size
> decreased and everything went back to normal. I also had to restart MGRs
> and MDSs afterwards. This starts looking like a bug to me.

In our case, we had a real database corruption in the rocksdb that caused version counters to mismatch the real data. For such cases I wrote a repair routine that should fix such cases, it worked here:

https://github.com/ceph/ceph/pull/44511

kind regards
 Daniel

Janek


On 26/02/2021 15:24, Janek Bevendorff wrote:
Since the full cluster restart and disabling logging to syslog, it's not a problem any more (for now).

Unfortunately, just disabling clog_to_monitors didn't have the wanted effect when I tried it yesterday. But I also believe that it is somehow related. I could not find any specific reason for the incident yesterday in the logs besides a few more RocksDB status and compact messages than usual, but that's more symptomatic.


On 26/02/2021 13:05, Mykola Golub wrote:
On Thu, Feb 25, 2021 at 08:58:01PM +0100, Janek Bevendorff wrote:

On the first MON, the command doesn’t even return, but I was able to
get a dump from the one I restarted most recently. The oldest ops
look like this:

         {
             "description": "log(1000 entries from seq 17876238 at 2021-02-25T15:13:20.306487+0100)",
             "initiated_at": "2021-02-25T20:40:34.698932+0100",
             "age": 183.762551121,
             "duration": 183.762599201,
The mon stores cluster log messages in the mon db. You mentioned
problems with osds flooding with log messages. It looks like related.

If you still observe the db growth you may try temporarily disable
clog_to_monitors, i.e. set for all osds:

  clog_to_monitors = false

And see if it stops growing after this and if it helps with the slow
ops (it might make sense to restar mons if some look like get
stuck). You can apply the config option on the fly (without restarting
the osds, e.g with injectargs), but when re-enabling back you will
have to restart the osds to avoid crashes due to this bug [1].

[1] https://tracker.ceph.com/issues/48946

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux