Re: Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 29, 2019 at 9:09 PM Jérémy Gardais
<jeremy.gardais@xxxxxxxxxxxxxxx> wrote:
>
> Thus spake Brad Hubbard (bhubbard@xxxxxxxxxx) on mardi 29 octobre 2019 à 08:20:31:
> > Yes, try and get the pgs healthy, then you can just re-provision the down OSDs.
> >
> > Run a scrub on each of these pgs and then use the commands on the
> > following page to find out more information for each case.
> >
> > https://docs.ceph.com/docs/luminous/rados/troubleshooting/troubleshooting-pg/
> >
> > Focus on the commands 'list-missing', 'list-inconsistent-obj', and
> > 'list-inconsistent-snapset'.
> >
> > Let us know if you get stuck.
> >
> > P.S. There are several threads about these sorts of issues in this
> > mailing list that should turn up when doing a web search.
>
> I found this thread :
> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg53116.html

That looks like the same issue.

>
> And i start to get additionnals informations to solve PG 2.2ba :
> 1. rados list-inconsistent-snapset 2.2ba --format=json-pretty
> {
>     "epoch": 192223,
>     "inconsistents": [
>         {
>             "name": "rbd_data.b4537a2ae8944a.000000000000425f",
>             "nspace": "",
>             "locator": "",
>             "snap": 22772,
>             "errors": [
>                 "headless"
>             ]
>         },
>         {
>             "name": "rbd_data.b4537a2ae8944a.000000000000425f",
>             "nspace": "",
>             "locator": "",
>             "snap": "head",
>             "snapset": {
>                 "snap_context": {
>                     "seq": 22806,
>                     "snaps": [
>                         22805,
>                         22804,
>                         22674,
>                         22619,
>                         20536,
>                         17248,
>                         14270
>                     ]
>                 },
>                 "head_exists": 1,
>                 "clones": [
>                     {
>                         "snap": 17248,
>                         "size": 4194304,
>                         "overlap": "[0~2269184,2277376~1916928]",
>                         "snaps": [
>                             17248
>                         ]
>                     },
>                     {
>                         "snap": 20536,
>                         "size": 4194304,
>                         "overlap": "[0~2269184,2277376~1916928]",
>                         "snaps": [
>                             20536
>                         ]
>                     },
>                     {
>                         "snap": 22625,
>                         "size": 4194304,
>                         "overlap": "[0~2269184,2277376~1916928]",
>                         "snaps": [
>                             22619
>                         ]
>                     },
>                     {
>                         "snap": 22674,
>                         "size": 4194304,
>                         "overlap": "[266240~4096]",
>                         "snaps": [
>                             22674
>                         ]
>                     },
>                     {
>                         "snap": 22805,
>                         "size": 4194304,
>                         "overlap": "[0~942080,958464~901120,1875968~16384,1908736~360448,2285568~1908736]",
>                         "snaps": [
>                             22805,
>                             22804
>                         ]
>                     }
>                 ]
>             },
>             "errors": [
>                 "extra_clones"
>             ],
>             "extra clones": [
>                 22772
>             ]
>         }
>     ]
> }
>
> 2.a ceph-objectstore-tool from osd.29 and osd.42 :
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-29/ --pgid 2.2ba --op list rbd_data.b4537a2ae8944a.000000000000425f
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":17248,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":20536,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22625,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22674,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22805,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":-2,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
>
> 2.b ceph-objectstore-tool from osd.30 :
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-30/ --pgid 2.2ba --op list rbd_data.b4537a2ae8944a.000000000000425f
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":17248,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":20536,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22625,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22674,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22805,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
> ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":-2,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
>
> I needed to shutdown the OSD service (30, 29 then 42) to be able to
> get any result. Otherwise i only had these errors :
> Mount failed with '(11) Resource temporarily unavailable'
> Or
> OSD has the store locked

Yes, the object store tool requires the OSD to be shut down.

>
>
>
> Without doing anything else, 2 OSDs start flapping (osd.38 and
> osd.27) with 1 PG switching between inactive, down and up… :

Maybe you should set nodown and noout while you do these maneuvers?
That will minimise peering and recovery (data movement).

>
> HEALTH_ERR 2 osds down; 12128/37456062 objects misplaced (0.032%); 4 scrub errors; Reduced data availability: 1 pg inactive, 1 pg down; Possible data damage: 2 pgs inconsistent; Degraded data redundancy: 2264342/37456062 objects degraded (6.045%), 859 pgs degraded
> OSD_DOWN 2 osds down
>     osd.27 (root=default,datacenter=IPR,room=11B,rack=baie2,host=r730xd3) is down
>     osd.38 (root=default,datacenter=IPR,room=11B,rack=baie2,host=r740xd1) is down
> OBJECT_MISPLACED 12128/37456062 objects misplaced (0.032%)
> OSD_SCRUB_ERRORS 4 scrub errors
> PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg down
>     pg 2.448 is down, acting [0]
> PG_DAMAGED Possible data damage: 2 pgs inconsistent
>   pg 2.2ba is active+clean+inconsistent, acting [42,29,30]
>   pg 2.2bb is active+clean+inconsistent, acting [25,42,18]
>   pg 2.371 is active+undersized+degraded+remapped+inconsistent+backfill_wait,acting [42,9]
>
>
>
> If i correctly understood the previous thread, i should remove the
> snapid 22772 from osd.29 and osd.42 :
> ceph-objectstore-tool --pgid 2.2ba --data-path /var/lib/ceph/osd/ceph-29/ ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] remove
> ceph-objectstore-tool --pgid 2.2ba --data-path /var/lib/ceph/osd/ceph-42/ ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] remove

That looks right.

>
> Still need to shutdown the service before or i miss an important thing ?

Yes.

>
> Sorry for the noob's noise, i not really confortable with the current
> state of my cluster -_-

You should probably try and work out what caused the issue and take
steps to minimise the likelihood of a recurrence. This is not expected
behaviour in a correctly configured and stable environment.

>
> --
> Gardais Jérémy
> Institut de Physique de Rennes
> Université Rennes 1
> Téléphone: 02-23-23-68-60
> Mail & bonnes pratiques: http://fr.wikipedia.org/wiki/Nétiquette
> -------------------------------

-- 
Cheers,
Brad

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux