Re: [ceph-users] Failed to repair pg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 07, 2019 at 01:37:55PM -0300, Herbert Alexander Faleiros wrote:
> Hi,
> 
> # ceph health detail
> HEALTH_ERR 3 scrub errors; Possible data damage: 1 pg inconsistent
> OSD_SCRUB_ERRORS 3 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
>     pg 2.2bb is active+clean+inconsistent, acting [36,12,80]
> 
> # ceph pg repair 2.2bb
> instructing pg 2.2bb on osd.36 to repair
> 
> But:
> 
> 2019-03-07 13:23:38.636881 [ERR]  Health check update: Possible data damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED) 
> 2019-03-07 13:20:38.373431 [ERR]  2.2bb deep-scrub 3 errors 
> 2019-03-07 13:20:38.373426 [ERR]  2.2bb deep-scrub 0 missing, 1 inconsistent objects 
> 2019-03-07 13:20:43.486860 [ERR]  Health check update: 3 scrub errors (OSD_SCRUB_ERRORS) 
> 2019-03-07 13:19:17.741350 [ERR]  deep-scrub 2.2bb 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : is an unexpected clone 
> 2019-03-07 13:19:17.523042 [ERR]  2.2bb shard 36 soid 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : data_digest 0xffffffff != data_digest 0xfc6b9538 from shard 12, size 0 != size 4194304 from auth oi 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986(482757'14986708 client.112595650.0:344888465 dirty|omap_digest s 4194304 uv 14974021 od ffffffff alloc_hint [0 0 0]), size 0 != size 4194304 from shard 12 
> 2019-03-07 13:19:17.523038 [ERR]  2.2bb shard 36 soid 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : candidate size 0 info size 4194304 mismatch 
> 2019-03-07 13:16:48.542673 [ERR]  2.2bb repair 2 errors, 1 fixed 
> 2019-03-07 13:16:48.542656 [ERR]  2.2bb repair 1 missing, 0 inconsistent objects 
> 2019-03-07 13:16:53.774956 [ERR]  Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED) 
> 2019-03-07 13:16:53.774916 [ERR]  Health check update: 2 scrub errors (OSD_SCRUB_ERRORS) 
> 2019-03-07 13:15:16.986872 [ERR]  repair 2.2bb 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : is an unexpected clone 
> 2019-03-07 13:15:16.986817 [ERR]  2.2bb shard 36 2:dd4a7bd3:::rbd_data.dfd5e2235befd0.000000000001c299:4f986 : missing 
> 2019-03-07 13:12:18.517442 [ERR]  Health check update: Possible data damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED) 
> 
> Also tried deep-scrub and scrub, same results.
> 
> Also set noscrub,nodeep-scrub, kicked currently active scrubs one at
> a time using 'ceph osd down <id>'. After the last scrub was kicked,
> forced scrub ran immediately then 'ceph pg repair', no luck.
> 
> Finally tryed the manual aproach:
> 
>  - stop osd.36
>  - flush-journal
>  - rm rbd\udata.dfd5e2235befd0.000000000001c299__4f986_CBDE52BB__2
>  - start osd.36
>  - ceph pg repair 2.2bb
> 
> Also no luck...
> 
> rbd\udata.dfd5e2235befd0.000000000001c299__4f986_CBDE52BB__2 at osd.36
> is empty (0 size). At osd.80 4.0M, osd.2 is bluestore (can't find it).
> 
> Ceph is 12.2.10, I'm currently migrating all my OSDs to bluestore.
> 
> Is there anything else I can do?

Should I do something like this? (below, after stop osd.36)

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-36/ --journal-path /dev/sdc1 rbd_data.dfd5e2235befd0.000000000001c299 remove-clone-metadata 326022

I'm no sure about rbd_data.$RBD and $CLONEID (took from rados
list-inconsistent-obj, also below).

> # rados list-inconsistent-obj 2.2bb | jq
> {
>   "epoch": 484655,
>   "inconsistents": [
>     {
>       "object": {
>         "name": "rbd_data.dfd5e2235befd0.000000000001c299",
>         "nspace": "",
>         "locator": "",
>         "snap": 326022,
>         "version": 14974021
>       },
>       "errors": [
>         "data_digest_mismatch",
>         "size_mismatch"
>       ],
>       "union_shard_errors": [
>         "size_mismatch_info",
>         "obj_size_info_mismatch"
>       ],
>       "selected_object_info": {
>         "oid": {
>           "oid": "rbd_data.dfd5e2235befd0.000000000001c299",
>           "key": "",
>           "snapid": 326022,
>           "hash": 3420345019,
>           "max": 0,
>           "pool": 2,
>           "namespace": ""
>         },
>         "version": "482757'14986708",
>         "prior_version": "482697'14980304",
>         "last_reqid": "client.112595650.0:344888465",
>         "user_version": 14974021,
>         "size": 4194304,
>         "mtime": "2019-03-02 22:30:23.812849",
>         "local_mtime": "2019-03-02 22:30:23.813281",
>         "lost": 0,
>         "flags": [
>           "dirty",
>           "omap_digest"
>         ],
>         "legacy_snaps": [],
>         "truncate_seq": 0,
>         "truncate_size": 0,
>         "data_digest": "0xffffffff",
>         "omap_digest": "0xffffffff",
>         "expected_object_size": 0,
>         "expected_write_size": 0,
>         "alloc_hint_flags": 0,
>         "manifest": {
>           "type": 0,
>           "redirect_target": {
>             "oid": "",
>             "key": "",
>             "snapid": 0,
>             "hash": 0,
>             "max": 0,
>             "pool": -9223372036854776000,
>             "namespace": ""
>           }
>         },
>         "watchers": {}
>       },
>       "shards": [
>         {
>           "osd": 12,
>           "primary": false,
>           "errors": [],
>           "size": 4194304,
>           "omap_digest": "0xffffffff",
>           "data_digest": "0xfc6b9538"
>         },
>         {
>           "osd": 36,
>           "primary": true,
>           "errors": [
>             "size_mismatch_info",
>             "obj_size_info_mismatch"
>           ],
>           "size": 0,
>           "omap_digest": "0xffffffff",
>           "data_digest": "0xffffffff",
>           "object_info": {
>             "oid": {
>               "oid": "rbd_data.dfd5e2235befd0.000000000001c299",
>               "key": "",
>               "snapid": 326022,
>               "hash": 3420345019,
>               "max": 0,
>               "pool": 2,
>               "namespace": ""
>             },
>             "version": "482757'14986708",
>             "prior_version": "482697'14980304",
>             "last_reqid": "client.112595650.0:344888465",
>             "user_version": 14974021,
>             "size": 4194304,
>             "mtime": "2019-03-02 22:30:23.812849",
>             "local_mtime": "2019-03-02 22:30:23.813281",
>             "lost": 0,
>             "flags": [
>               "dirty",
>               "omap_digest"
>             ],
>             "legacy_snaps": [],
>             "truncate_seq": 0,
>             "truncate_size": 0,
>             "data_digest": "0xffffffff",
>             "omap_digest": "0xffffffff",
>             "expected_object_size": 0,
>             "expected_write_size": 0,
>             "alloc_hint_flags": 0,
>             "manifest": {
>               "type": 0,
>               "redirect_target": {
>                 "oid": "",
>                 "key": "",
>                 "snapid": 0,
>                 "hash": 0,
>                 "max": 0,
>                 "pool": -9223372036854776000,
>                 "namespace": ""
>               }
>             },
>             "watchers": {}
>           }
>         },
>         {
>           "osd": 80,
>           "primary": false,
>           "errors": [],
>           "size": 4194304,
>           "omap_digest": "0xffffffff",
>           "data_digest": "0xfc6b9538"
>         }
>       ]
>     }
>   ]
> }



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux