Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have stopped deepscrubbing a while ago. However, forcing a deepscrub by doing "ceph pg deep-scrub 6.180" doesn't do anything. The deepscrub doesn't run at all. Could the deepscrubbing process be stuck elsewhere?

On 11/18/21 3:29 PM, Wesley Dillingham wrote:
That response is typically indicative of a pg whose OSD sets has changed since it was last scrubbed (typically from a disk failing).

Are you sure its actually getting scrubbed when you issue the scrub? For example you can issue: "ceph pg <pg_id> query" and look for "last_deep_scrub_stamp" which will tell you when it was last deep scrubbed.

Further, in sufficiently recent versions of Ceph (introduced in 14.2.something iirc) setting the flag "nodeep-scrub" will cause all in flight deep-scrubs to stop immediately. You may have a scheduling issue where you deep-scrub or repairs arent getting scheduled.

Set the nodeep-scrub flag: "ceph osd set nodeep-scrub" and wait for all current deep-scrubs to complete then try and manually re-issue the deep scrub "ceph pg deep-scrub <pg_id>" at this point your scrub should start near immediately and "rados list-inconsistent-obj 6.180 --format=json-pretty" should return with something of value.

Respectfully,

*Wes Dillingham*
wes@xxxxxxxxxxxxxxxxx <mailto:wes@xxxxxxxxxxxxxxxxx>
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Thu, Nov 18, 2021 at 2:38 PM J-P Methot <jp.methot@xxxxxxxxxxxxxxxxx <mailto:jp.methot@xxxxxxxxxxxxxxxxx>> wrote:

    Hi,

    We currently have a PG stuck in an inconsistent state on an erasure
    coded pool. The pool's K and M values are 33 and 3.  The command
    rados
    list-inconsistent-obj 6.180 --format=json-pretty results in the
    following error:

    No scrub information available for pg 6.180 error 2: (2) No such
    file or
    directory

    Forcing a deep scrub of the pg does not fix this. Doing a ceph pg
    repair
    6.180 doesn't seem to do anything. Is there a known bug explaining
    this
    behavior? I am attaching informations regarding the PG in question.

-- Jean-Philippe Méthot
    Senior Openstack system administrator
    Administrateur système Openstack sénior
    PlanetHoster inc.

    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>

--
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux