Re: ceph pgs inconsistent, always the same checksum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Tue, Sep 8, 2020 at 11:20 PM David Orman <ormandj@xxxxxxxxxxxx> wrote:

>
> Every time we look at them, we see the same checksum (0x6706be76):
>
> This looks a lot like: https://tracker.ceph.com/issues/22464
>
>
Some more context on this as I've built the work-around for this issue:

* the checksum is for a block of all zeroes
* this seemed to happen when memory runs low
* it is *NOT* related to swap: this happened on systems with swap disabled
and no file-backed mmaped memory (BlueStore-only servers w/o non-OSD disks)
* only showed up on some kernel versions
* re-trying the read did solve it, very rare to see two consecutive read
failures, never saw it with 3 retries
* root cause was never found, as I never managed to reliably reproduce this
on test setups where I could play around with bisecting the kernel :(

Here's the patch that added the read retries:
https://github.com/ceph/ceph/pull/23273/files

What you can do is:

1. check the performance counter bluestore_reads_with_retries on affected
OSDs, should be non-zero
2. increase the setting bluestore_retry_disk_reads (default 3) to see if
that helps

Anyways, what you are seeing might be something completely different than
whatever caused this bug... but it's worth playing around with the retry
option

Paul


> That said, we've got the following versions in play (cluster was created
> with 15.2.3):
>
> ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus
> (stable)
>
>
> This is a containerized cephadm installation, in case it's relevant.
> Distribution is Ubuntu 18.04.04, kernel is the HWE kernel:
>
> Linux ceph02 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24
> UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>
> A repair operation 'fixes' it. These are occurring across many PGs, on the
> various different servers, and we see no indication of any hardware related
> issues.
>
> Any ideas what to do next?
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux