Re: Recommended procedure in case of OSD_SCRUB_ERRORS / PG_DAMAGED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, I will try this the next time!

Am Mi., 19. Okt. 2022 um 13:50 Uhr schrieb Eugen Block <eblock@xxxxxx>:

> Hi,
>
> you don't need to stop the OSDs, just query the inconsistent object,
> here's a recent example (form an older cluster though):
>
> ---snip---
>      health: HEALTH_ERR
>              1 scrub errors
>              Possible data damage: 1 pg inconsistent
>
> admin:~ # ceph health detail
> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
>      pg 7.17a is active+clean+inconsistent, acting [15,2,58,33,28,69]
>
> admin:~ # rados -p cephfs_data list-inconsistent-obj 7.17a | jq
> [...]
>        "shards": [
>          {
>            "osd": 2,
>            "primary": false,
>            "errors": [],
>            "size": 2780496,
>            "omap_digest": "0xffffffff",
>            "data_digest": "0x11e1764c"
>          },
>          {
>            "osd": 15,
>            "primary": true,
>            "errors": [],
>            "size": 2780496,
>            "omap_digest": "0xffffffff",
>            "data_digest": "0x11e1764c"
>          },
>          {
>            "osd": 28,
>            "primary": false,
>            "errors": [],
>            "size": 2780496,
>            "omap_digest": "0xffffffff",
>            "data_digest": "0x11e1764c"
>          },
>          {
>            "osd": 33,
>            "primary": false,
>            "errors": [
>              "read_error"
>            ],
>            "size": 2780496
>          },
>          {
>            "osd": 58,
>            "primary": false,
>            "errors": [],
>            "size": 2780496,
>            "omap_digest": "0xffffffff",
>            "data_digest": "0x11e1764c"
>          },
>          {
>            "osd": 69,
>            "primary": false,
>            "errors": [],
>            "size": 2780496,
>            "omap_digest": "0xffffffff",
>            "data_digest": "0x11e1764c"
> ---snip---
>
> Five of the six omap_digest and data_digest values were identical, so
> it was safe to run 'ceph pg repair 7.17a'.
>
> Regards,
> Eugen
>
> Zitat von E Taka <0etaka0@xxxxxxxxx>:
>
> > (17.2.4, 3 replicated, Container install)
> >
> > Hello,
> >
> > since many of the information found in the WWW or books is outdated, I
> want
> > to ask which procedure is recommended to repair damaged PG with status
> > active+clean+inconsistent for Ceph Quincy.
> >
> > IMHO, the best process for a pool with 3 replicas it would be to check if
> > two of the replicas are identical and replace the third different one.
> >
> > If I understand it correctly, the ceph-objectstore-tool could be used for
> > this approach, but unfortunately it is difficult even to start,
> especially
> > in a Docker environment. (OSD have to marked as "down", the Ubuntu
> package
> > ceph-osd, where ceph-objectstore-tool is included, starts server
> processes
> > which confuse the dockerized environment).
> >
> > Is “ceph pg repair” safe to use, and is there a risk to enable
> > osd_scrub_auto_repair and osd_repair_during_recovery?
> >
> > Thanks!
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux