Re: Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thus spake Brad Hubbard (bhubbard@xxxxxxxxxx) on mardi 29 octobre 2019 à 08:20:31:
> Yes, try and get the pgs healthy, then you can just re-provision the down OSDs.
>
> Run a scrub on each of these pgs and then use the commands on the
> following page to find out more information for each case.
>
> https://docs.ceph.com/docs/luminous/rados/troubleshooting/troubleshooting-pg/
>
> Focus on the commands 'list-missing', 'list-inconsistent-obj', and
> 'list-inconsistent-snapset'.
>
> Let us know if you get stuck.
>
> P.S. There are several threads about these sorts of issues in this
> mailing list that should turn up when doing a web search.

I found this thread :
https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg53116.html

And i start to get additionnals informations to solve PG 2.2ba :
1. rados list-inconsistent-snapset 2.2ba --format=json-pretty
{
    "epoch": 192223,
    "inconsistents": [
        {
            "name": "rbd_data.b4537a2ae8944a.000000000000425f",
            "nspace": "",
            "locator": "",
            "snap": 22772,
            "errors": [
                "headless"
            ]
        },
        {
            "name": "rbd_data.b4537a2ae8944a.000000000000425f",
            "nspace": "",
            "locator": "",
            "snap": "head",
            "snapset": {
                "snap_context": {
                    "seq": 22806,
                    "snaps": [
                        22805,
                        22804,
                        22674,
                        22619,
                        20536,
                        17248,
                        14270
                    ]
                },
                "head_exists": 1,
                "clones": [
                    {
                        "snap": 17248,
                        "size": 4194304,
                        "overlap": "[0~2269184,2277376~1916928]",
                        "snaps": [
                            17248
                        ]
                    },
                    {
                        "snap": 20536,
                        "size": 4194304,
                        "overlap": "[0~2269184,2277376~1916928]",
                        "snaps": [
                            20536
                        ]
                    },
                    {
                        "snap": 22625,
                        "size": 4194304,
                        "overlap": "[0~2269184,2277376~1916928]",
                        "snaps": [
                            22619
                        ]
                    },
                    {
                        "snap": 22674,
                        "size": 4194304,
                        "overlap": "[266240~4096]",
                        "snaps": [
                            22674
                        ]
                    },
                    {
                        "snap": 22805,
                        "size": 4194304,
                        "overlap": "[0~942080,958464~901120,1875968~16384,1908736~360448,2285568~1908736]",
                        "snaps": [
                            22805,
                            22804
                        ]
                    }
                ]
            },
            "errors": [
                "extra_clones"
            ],
            "extra clones": [
                22772
            ]
        }
    ]
}

2.a ceph-objectstore-tool from osd.29 and osd.42 :
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-29/ --pgid 2.2ba --op list rbd_data.b4537a2ae8944a.000000000000425f
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":17248,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":20536,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22625,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22674,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22805,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":-2,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]

2.b ceph-objectstore-tool from osd.30 :
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-30/ --pgid 2.2ba --op list rbd_data.b4537a2ae8944a.000000000000425f
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":17248,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":20536,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22625,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22674,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22805,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]
["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":-2,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]

I needed to shutdown the OSD service (30, 29 then 42) to be able to
get any result. Otherwise i only had these errors :
Mount failed with '(11) Resource temporarily unavailable'
Or
OSD has the store locked



Without doing anything else, 2 OSDs start flapping (osd.38 and
osd.27) with 1 PG switching between inactive, down and up… :

HEALTH_ERR 2 osds down; 12128/37456062 objects misplaced (0.032%); 4 scrub errors; Reduced data availability: 1 pg inactive, 1 pg down; Possible data damage: 2 pgs inconsistent; Degraded data redundancy: 2264342/37456062 objects degraded (6.045%), 859 pgs degraded
OSD_DOWN 2 osds down
    osd.27 (root=default,datacenter=IPR,room=11B,rack=baie2,host=r730xd3) is down
    osd.38 (root=default,datacenter=IPR,room=11B,rack=baie2,host=r740xd1) is down
OBJECT_MISPLACED 12128/37456062 objects misplaced (0.032%)
OSD_SCRUB_ERRORS 4 scrub errors
PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg down
    pg 2.448 is down, acting [0]
PG_DAMAGED Possible data damage: 2 pgs inconsistent
  pg 2.2ba is active+clean+inconsistent, acting [42,29,30]
  pg 2.2bb is active+clean+inconsistent, acting [25,42,18]
  pg 2.371 is active+undersized+degraded+remapped+inconsistent+backfill_wait,acting [42,9]
…


If i correctly understood the previous thread, i should remove the
snapid 22772 from osd.29 and osd.42 :
ceph-objectstore-tool --pgid 2.2ba --data-path /var/lib/ceph/osd/ceph-29/ ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] remove
ceph-objectstore-tool --pgid 2.2ba --data-path /var/lib/ceph/osd/ceph-42/ ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] remove

Still need to shutdown the service before or i miss an important thing ?

Sorry for the noob's noise, i not really confortable with the current
state of my cluster -_-

--
Gardais Jérémy
Institut de Physique de Rennes
Université Rennes 1
Téléphone: 02-23-23-68-60
Mail & bonnes pratiques: http://fr.wikipedia.org/wiki/Nétiquette
-------------------------------
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux