Re: 9 out of 11 missing shards of shadow object in ERC 8:3 pool.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Robert,

thanks for the update, it's great that the issue is resolved.

Zitat von Robert Kihlberg <robkih@xxxxxxxxx>:

Thanks Josh and Eugen,

I did not manage to trace this object to an S3 object. Instead I read all
files in the suspected S3 bucket and
actually hit a bad one. Since we had a known good mirror I deleted the
broken S3 object (which succeeded fine)
and uploaded the good one (also succeeded). Data wise we're fine.

After this, a deep-scrub still reported the PG to be inconsistent.
[DBG] 11.3ff deep-scrub starts
[ERR] 11.3ffs0 deep-scrub : stat mismatch, got 417274/417273 objects, 0/0
clones, 417274/417273 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0
whiteouts, 2997719087159/2997710698551 bytes, 0/0 manifest objects, 0/0
hit_set_archive bytes.
[ERR] 11.3ff deep-scrub 1 errors

However, rados list-inconsistent-obj 11.3ff did not report any more details.
{
  "epoch": 177981,
  "inconsistents": []
}

We were hesitant to run a repair on it since we didn't know what
would happen, but I accidently did it anyway...
[DBG] 11.3ff repair starts
[ERR] 11.3ffs0 repair : stat mismatch, got 420131/420130 objects, 0/0
clones, 420131/420130 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0
whiteouts, 3020439864945/3020431476337 bytes, 0/0 manifest objects, 0/0
hit_set_archive bytes.
[ERR] 11.3ff repair 1 errors, 1 fixed

It appears to have fixed it and scrubbing it afterwards reports no issues.

So this issue seems solved for now.

Best Regards,
Robert


Den lör 5 okt. 2024 kl 14:45 skrev Eugen Block <eblock@xxxxxx>:

This reminds me of this tracker:

https://tracker.ceph.com/issues/50351

IIRC, the information could be actually lost on the OSDs. I’m
surprised that the number of missing shards is that high though. If
you have the objects mirrored, maybe importing it with
objectstore-tool could be a way forward. But I’m really not sure if
that would be the right approach.

Zitat von Robert Kihlberg <robkih@xxxxxxxxx>:

> After an upgrade from Nautilus to Pacific the scrub has found an
> inconsistent
> object and reports that 9 out of 11 shards are missing. (However, we're
not
> sure this has to do with the upgrade).
>
> We have been able to trace it to a S3 bucket, but not to a specific S3
> object.
>
> # radosgw-admin object stat --bucket=$BUCKET --object=$OBJECT
> ERROR: failed to stat object, returned error: (2) No such file or
directory
>
> By design, we have a complete mirror of the bucket in another Ceph
cluster
> and the amount of objects in the buckets match between the clusters. We
are
> therefore somewhat confident that we are not missing any objects.
>
> Could this be a failed garbage collection where perhaps the primary OSD
> failed during gc?
>
> The garbage collector does not show anything that seems relevant
though...
> radosgw-admin gc list --include-all | grep
> "eaa6801e-3967-4541-9b8ca98aa5c2.791015596"
>
> Any suggestions on how we can trace and/or fix this inconsistent object?
>
> # rados list-inconsistent-obj 11.3ff | jq
> {
>   "epoch": 177981,
>   "inconsistents": [
>     {
>       "object": {
>         "name":
>
"eaa6801e-3967-4541-9b8ca98aa5c2.791015596.129__shadow_.3XHvgPjrJa3erG4rPlW3brboBWagE95_5",
>         "nspace": "",
>         "locator": "",
>         "snap": "head",
>         "version": 109853
>       },
>       "errors": [],
>       "union_shard_errors": [
>         "missing"
>       ],
>       "selected_object_info": {
>         "oid": {
>           "oid":
>
"eaa6801e-3967-4541-9b8ca98aa5c2.791015596.129__shadow_.3XHvgPjrJa3erG4rPlW3brboBWagE95_5",
>           "key": "",
>           "snapid": -2,
>           "hash": 4294967295,
>           "max": 0,
>           "pool": 11,
>           "namespace": ""
>         },
>         "version": "17636'109853",
>         "prior_version": "0'0",
>         "last_reqid": "client.791015590.0:449317175",
>         "user_version": 109853,
>         "size": 8388608,
>         "mtime": "2022-01-24T03:33:42.457722+0000",
>         "local_mtime": "2022-01-24T03:33:42.471042+0000",
>         "lost": 0,
>         "flags": [
>           "dirty",
>           "data_digest"
>         ],
>         "truncate_seq": 0,
>         "truncate_size": 0,
>         "data_digest": "0xe588978d",
>         "omap_digest": "0xffffffff",
>         "expected_object_size": 0,
>         "expected_write_size": 0,
>         "alloc_hint_flags": 0,
>         "manifest": {
>           "type": 0
>         },
>         "watchers": {}
>       },
>       "shards": [
>         {
>           "osd": 14,
>           "primary": true,
>           "shard": 0,
>           "errors": [],
>           "size": 1048576
>         },
>         {
>           "osd": 67,
>           "primary": false,
>           "shard": 1,
>           "errors": [
>             "missing"
>           ]
>         },
>         {
>           "osd": 77,
>           "primary": false,
>           "shard": 4,
>           "errors": [],
>           "size": 1048576
>         },
>         {
>           "osd": 225,
>           "primary": false,
>           "shard": 9,
>           "errors": [
>             "missing"
>           ]
>         },
>         {
>           "osd": 253,
>           "primary": false,
>           "shard": 8,
>           "errors": [
>             "missing"
>           ]
>         },
>         {
>           "osd": 327,
>           "primary": false,
>           "shard": 6,
>           "errors": [
>             "missing"
>           ]
>         },
>         {
>           "osd": 568,
>           "primary": false,
>           "shard": 2,
>           "errors": [
>             "missing"
>           ]
>         },
>         {
>           "osd": 610,
>           "primary": false,
>           "shard": 7,
>           "errors": [
>             "missing"
>           ]
>         },
>         {
>           "osd": 700,
>           "primary": false,
>           "shard": 3,
>           "errors": [
>             "missing"
>           ]
>         },
>         {
>           "osd": 736,
>           "primary": false,
>           "shard": 10,
>           "errors": [
>             "missing"
>           ]
>         },
>         {
>           "osd": 764,
>           "primary": false,
>           "shard": 5,
>           "errors": [
>             "missing"
>           ]
>         }
>       ]
>     }
>   ]
> }
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux