Re: mon scrub error (scrub mismatch)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan,

it went unnoticed and is in all log files + rotated. I also wondered about the difference in #auth keys and looked at it. However, we have only 23 auth keys (its a small test cluster). No idea what the 77/78 mean. Maybe including some history?

I went ahead and rebuilt the mon store before I got your e-mail, so no more debugging possible unless some log info might be useful.

I'm more wondering why this is not flagged as a health issue. Is it harmless? What if things degrade even more over time?

In older versions (well, luminous) it seems that it was flagged as an error. It would also be nice to have a command like "ceph mon repair" or "ceph mon resync" instead of having to do a complete manual daemon rebuild.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dvanders@xxxxxxxxx>
Sent: 03 January 2023 19:41
To: Frank Schilder
Cc: Eugen Block; ceph-users@xxxxxxx
Subject: Re:  Re: mon scrub error (scrub mismatch)

Hi Frank,

Can you work backwards in the logs to when this first appeared?
The scrub error is showing that mon.0 has 78 auth keys and the other
two have 77. So you'd have query the auth keys of each mon to see if
you get a different response each time (e.g. ceph auth list), and
compare with what you expect.

Cheers, Dan

On Tue, Jan 3, 2023 at 9:29 AM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi Eugen,
>
> thanks for your answer. All our mons use rocksdb.
>
> I found some old threads, but they never really explained anything. What irritates me is that this is a silent corruption. If you don't read the logs every day, you will not see it, ceph status reports health ok. That's also why I'm wondering if this is a real issue or not.
>
> It would be great if someone could shed light on (1) how serious this is, (2) why it doesn't trigger a health warning/error and (3) why the affected mon doesn't sync back from the majority right away.
>
> Thanks and best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Eugen Block <eblock@xxxxxx>
> Sent: 03 January 2023 15:04:34
> To: ceph-users@xxxxxxx
> Subject:  Re: mon scrub error (scrub mismatch)
>
> Hi Frank,
>
> I had this a few years back and ended up recreating the MON with the
> scrub mismatch, so in your case it probably would be mon.0. To test if
> the problem still exists you can trigger a mon scrub manually:
>
> ceph mon scrub
>
> Are all MONs on rocksdb back end in this cluster? I didn't check back
> then if this was the case in our cluster, so I'm just wondering if
> that could be an explanation.
>
> Regards,
> Eugen
>
> Zitat von Frank Schilder <frans@xxxxxx>:
>
> > Hi all,
> >
> > we have these messages in our logs daily:
> >
> > 1/3/23 12:20:00 PM[INF]overall HEALTH_OK
> > 1/3/23 12:19:46 PM[ERR] mon.2 ScrubResult(keys
> > {auth=77,config=2,health=11,logm=10} crc
> > {auth=688385498,config=4279003239,health=3522308637,logm=132403602})
> > 1/3/23 12:19:46 PM[ERR] mon.0 ScrubResult(keys
> > {auth=78,config=2,health=11,logm=9} crc
> > {auth=325876668,config=4279003239,health=3522308637,logm=1083913445})
> > 1/3/23 12:19:46 PM[ERR]scrub mismatch
> > 1/3/23 12:19:46 PM[ERR] mon.1 ScrubResult(keys
> > {auth=77,config=2,health=11,logm=10} crc
> > {auth=688385498,config=4279003239,health=3522308637,logm=132403602})
> > 1/3/23 12:19:46 PM[ERR] mon.0 ScrubResult(keys
> > {auth=78,config=2,health=11,logm=9} crc
> > {auth=325876668,config=4279003239,health=3522308637,logm=1083913445})
> > 1/3/23 12:19:46 PM[ERR]scrub mismatch
> > 1/3/23 12:17:04 PM[INF]Cluster is now healthy
> > 1/3/23 12:17:04 PM[INF]Health check cleared: MON_CLOCK_SKEW (was:
> > clock skew detected on mon.tceph-02)
> >
> > Cluster is health OK:
> >
> > # ceph status
> >   cluster:
> >     id:     bf1f51f5-b381-4cf7-b3db-88d044c1960c
> >     health: HEALTH_OK
> >
> >   services:
> >     mon: 3 daemons, quorum tceph-01,tceph-02,tceph-03 (age 3M)
> >     mgr: tceph-01(active, since 8w), standbys: tceph-03, tceph-02
> >     mds: fs:1 {0=tceph-02=up:active} 2 up:standby
> >     osd: 9 osds: 9 up (since 3M), 9 in
> >
> >   task status:
> >
> >   data:
> >     pools:   4 pools, 321 pgs
> >     objects: 9.94M objects, 336 GiB
> >     usage:   1.6 TiB used, 830 GiB / 2.4 TiB avail
> >     pgs:     321 active+clean
> >
> > Unfortunately, google wasn't of too much help. Is this scrub error
> > something to worry about?
> >
> > Thanks and best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux