Re: 9 out of 11 missing shards of shadow object in ERC 8:3 pool.

Robert Kihlberg <robkih@xxxxxxxxx> · Thu, 31 Oct 2024 09:43:19 +0100

Thanks Josh and Eugen,

I did not manage to trace this object to an S3 object. Instead I read all
files in the suspected S3 bucket and
actually hit a bad one. Since we had a known good mirror I deleted the
broken S3 object (which succeeded fine)
and uploaded the good one (also succeeded). Data wise we're fine.

After this, a deep-scrub still reported the PG to be inconsistent.
[DBG] 11.3ff deep-scrub starts
[ERR] 11.3ffs0 deep-scrub : stat mismatch, got 417274/417273 objects, 0/0
clones, 417274/417273 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0
whiteouts, 2997719087159/2997710698551 bytes, 0/0 manifest objects, 0/0
hit_set_archive bytes.
[ERR] 11.3ff deep-scrub 1 errors

However, rados list-inconsistent-obj 11.3ff did not report any more details.
{
  "epoch": 177981,
  "inconsistents": []
}

We were hesitant to run a repair on it since we didn't know what
would happen, but I accidently did it anyway...
[DBG] 11.3ff repair starts
[ERR] 11.3ffs0 repair : stat mismatch, got 420131/420130 objects, 0/0
clones, 420131/420130 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0
whiteouts, 3020439864945/3020431476337 bytes, 0/0 manifest objects, 0/0
hit_set_archive bytes.
[ERR] 11.3ff repair 1 errors, 1 fixed

It appears to have fixed it and scrubbing it afterwards reports no issues.

So this issue seems solved for now.

Best Regards,
Robert

Den lör 5 okt. 2024 kl 14:45 skrev Eugen Block <eblock@xxxxxx>:

> This reminds me of this tracker:
>
> https://tracker.ceph.com/issues/50351
>
> IIRC, the information could be actually lost on the OSDs. I’m
> surprised that the number of missing shards is that high though. If
> you have the objects mirrored, maybe importing it with
> objectstore-tool could be a way forward. But I’m really not sure if
> that would be the right approach.
>
> Zitat von Robert Kihlberg <robkih@xxxxxxxxx>:
>
> > After an upgrade from Nautilus to Pacific the scrub has found an
> > inconsistent
> > object and reports that 9 out of 11 shards are missing. (However, we're
> not
> > sure this has to do with the upgrade).
> >
> > We have been able to trace it to a S3 bucket, but not to a specific S3
> > object.
> >
> > # radosgw-admin object stat --bucket=$BUCKET --object=$OBJECT
> > ERROR: failed to stat object, returned error: (2) No such file or
> directory
> >
> > By design, we have a complete mirror of the bucket in another Ceph
> cluster
> > and the amount of objects in the buckets match between the clusters. We
> are
> > therefore somewhat confident that we are not missing any objects.
> >
> > Could this be a failed garbage collection where perhaps the primary OSD
> > failed during gc?
> >
> > The garbage collector does not show anything that seems relevant
> though...
> > radosgw-admin gc list --include-all | grep
> > "eaa6801e-3967-4541-9b8ca98aa5c2.791015596"
> >
> > Any suggestions on how we can trace and/or fix this inconsistent object?
> >
> > # rados list-inconsistent-obj 11.3ff | jq
> > {
> >   "epoch": 177981,
> >   "inconsistents": [
> >     {
> >       "object": {
> >         "name":
> >
> "eaa6801e-3967-4541-9b8ca98aa5c2.791015596.129__shadow_.3XHvgPjrJa3erG4rPlW3brboBWagE95_5",
> >         "nspace": "",
> >         "locator": "",
> >         "snap": "head",
> >         "version": 109853
> >       },
> >       "errors": [],
> >       "union_shard_errors": [
> >         "missing"
> >       ],
> >       "selected_object_info": {
> >         "oid": {
> >           "oid":
> >
> "eaa6801e-3967-4541-9b8ca98aa5c2.791015596.129__shadow_.3XHvgPjrJa3erG4rPlW3brboBWagE95_5",
> >           "key": "",
> >           "snapid": -2,
> >           "hash": 4294967295,
> >           "max": 0,
> >           "pool": 11,
> >           "namespace": ""
> >         },
> >         "version": "17636'109853",
> >         "prior_version": "0'0",
> >         "last_reqid": "client.791015590.0:449317175",
> >         "user_version": 109853,
> >         "size": 8388608,
> >         "mtime": "2022-01-24T03:33:42.457722+0000",
> >         "local_mtime": "2022-01-24T03:33:42.471042+0000",
> >         "lost": 0,
> >         "flags": [
> >           "dirty",
> >           "data_digest"
> >         ],
> >         "truncate_seq": 0,
> >         "truncate_size": 0,
> >         "data_digest": "0xe588978d",
> >         "omap_digest": "0xffffffff",
> >         "expected_object_size": 0,
> >         "expected_write_size": 0,
> >         "alloc_hint_flags": 0,
> >         "manifest": {
> >           "type": 0
> >         },
> >         "watchers": {}
> >       },
> >       "shards": [
> >         {
> >           "osd": 14,
> >           "primary": true,
> >           "shard": 0,
> >           "errors": [],
> >           "size": 1048576
> >         },
> >         {
> >           "osd": 67,
> >           "primary": false,
> >           "shard": 1,
> >           "errors": [
> >             "missing"
> >           ]
> >         },
> >         {
> >           "osd": 77,
> >           "primary": false,
> >           "shard": 4,
> >           "errors": [],
> >           "size": 1048576
> >         },
> >         {
> >           "osd": 225,
> >           "primary": false,
> >           "shard": 9,
> >           "errors": [
> >             "missing"
> >           ]
> >         },
> >         {
> >           "osd": 253,
> >           "primary": false,
> >           "shard": 8,
> >           "errors": [
> >             "missing"
> >           ]
> >         },
> >         {
> >           "osd": 327,
> >           "primary": false,
> >           "shard": 6,
> >           "errors": [
> >             "missing"
> >           ]
> >         },
> >         {
> >           "osd": 568,
> >           "primary": false,
> >           "shard": 2,
> >           "errors": [
> >             "missing"
> >           ]
> >         },
> >         {
> >           "osd": 610,
> >           "primary": false,
> >           "shard": 7,
> >           "errors": [
> >             "missing"
> >           ]
> >         },
> >         {
> >           "osd": 700,
> >           "primary": false,
> >           "shard": 3,
> >           "errors": [
> >             "missing"
> >           ]
> >         },
> >         {
> >           "osd": 736,
> >           "primary": false,
> >           "shard": 10,
> >           "errors": [
> >             "missing"
> >           ]
> >         },
> >         {
> >           "osd": 764,
> >           "primary": false,
> >           "shard": 5,
> >           "errors": [
> >             "missing"
> >           ]
> >         }
> >       ]
> >     }
> >   ]
> > }
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx