Re: mon scrub error (scrub mismatch)

Frank Schilder <frans@xxxxxx> · Tue, 3 Jan 2023 17:29:33 +0000

Hi Eugen,

thanks for your answer. All our mons use rocksdb.

I found some old threads, but they never really explained anything. What irritates me is that this is a silent corruption. If you don't read the logs every day, you will not see it, ceph status reports health ok. That's also why I'm wondering if this is a real issue or not.

It would be great if someone could shed light on (1) how serious this is, (2) why it doesn't trigger a health warning/error and (3) why the affected mon doesn't sync back from the majority right away.

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 03 January 2023 15:04:34
To: ceph-users@xxxxxxx
Subject:  Re: mon scrub error (scrub mismatch)

Hi Frank,

I had this a few years back and ended up recreating the MON with the
scrub mismatch, so in your case it probably would be mon.0. To test if
the problem still exists you can trigger a mon scrub manually:

ceph mon scrub

Are all MONs on rocksdb back end in this cluster? I didn't check back
then if this was the case in our cluster, so I'm just wondering if
that could be an explanation.

Regards,
Eugen

Zitat von Frank Schilder <frans@xxxxxx>:

> Hi all,
>
> we have these messages in our logs daily:
>
> 1/3/23 12:20:00 PM[INF]overall HEALTH_OK
> 1/3/23 12:19:46 PM[ERR] mon.2 ScrubResult(keys
> {auth=77,config=2,health=11,logm=10} crc
> {auth=688385498,config=4279003239,health=3522308637,logm=132403602})
> 1/3/23 12:19:46 PM[ERR] mon.0 ScrubResult(keys
> {auth=78,config=2,health=11,logm=9} crc
> {auth=325876668,config=4279003239,health=3522308637,logm=1083913445})
> 1/3/23 12:19:46 PM[ERR]scrub mismatch
> 1/3/23 12:19:46 PM[ERR] mon.1 ScrubResult(keys
> {auth=77,config=2,health=11,logm=10} crc
> {auth=688385498,config=4279003239,health=3522308637,logm=132403602})
> 1/3/23 12:19:46 PM[ERR] mon.0 ScrubResult(keys
> {auth=78,config=2,health=11,logm=9} crc
> {auth=325876668,config=4279003239,health=3522308637,logm=1083913445})
> 1/3/23 12:19:46 PM[ERR]scrub mismatch
> 1/3/23 12:17:04 PM[INF]Cluster is now healthy
> 1/3/23 12:17:04 PM[INF]Health check cleared: MON_CLOCK_SKEW (was:
> clock skew detected on mon.tceph-02)
>
> Cluster is health OK:
>
> # ceph status
>   cluster:
>     id:     bf1f51f5-b381-4cf7-b3db-88d044c1960c
>     health: HEALTH_OK
>
>   services:
>     mon: 3 daemons, quorum tceph-01,tceph-02,tceph-03 (age 3M)
>     mgr: tceph-01(active, since 8w), standbys: tceph-03, tceph-02
>     mds: fs:1 {0=tceph-02=up:active} 2 up:standby
>     osd: 9 osds: 9 up (since 3M), 9 in
>
>   task status:
>
>   data:
>     pools:   4 pools, 321 pgs
>     objects: 9.94M objects, 336 GiB
>     usage:   1.6 TiB used, 830 GiB / 2.4 TiB avail
>     pgs:     321 active+clean
>
> Unfortunately, google wasn't of too much help. Is this scrub error
> something to worry about?
>
> Thanks and best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx