Thus spake Brad Hubbard (bhubbard@xxxxxxxxxx) on mardi 29 octobre 2019 à 08:20:31: > Yes, try and get the pgs healthy, then you can just re-provision the down OSDs. > > Run a scrub on each of these pgs and then use the commands on the > following page to find out more information for each case. > > https://docs.ceph.com/docs/luminous/rados/troubleshooting/troubleshooting-pg/ > > Focus on the commands 'list-missing', 'list-inconsistent-obj', and > 'list-inconsistent-snapset'. > > Let us know if you get stuck. > > P.S. There are several threads about these sorts of issues in this > mailing list that should turn up when doing a web search. I found this thread : https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg53116.html And i start to get additionnals informations to solve PG 2.2ba : 1. rados list-inconsistent-snapset 2.2ba --format=json-pretty { "epoch": 192223, "inconsistents": [ { "name": "rbd_data.b4537a2ae8944a.000000000000425f", "nspace": "", "locator": "", "snap": 22772, "errors": [ "headless" ] }, { "name": "rbd_data.b4537a2ae8944a.000000000000425f", "nspace": "", "locator": "", "snap": "head", "snapset": { "snap_context": { "seq": 22806, "snaps": [ 22805, 22804, 22674, 22619, 20536, 17248, 14270 ] }, "head_exists": 1, "clones": [ { "snap": 17248, "size": 4194304, "overlap": "[0~2269184,2277376~1916928]", "snaps": [ 17248 ] }, { "snap": 20536, "size": 4194304, "overlap": "[0~2269184,2277376~1916928]", "snaps": [ 20536 ] }, { "snap": 22625, "size": 4194304, "overlap": "[0~2269184,2277376~1916928]", "snaps": [ 22619 ] }, { "snap": 22674, "size": 4194304, "overlap": "[266240~4096]", "snaps": [ 22674 ] }, { "snap": 22805, "size": 4194304, "overlap": "[0~942080,958464~901120,1875968~16384,1908736~360448,2285568~1908736]", "snaps": [ 22805, 22804 ] } ] }, "errors": [ "extra_clones" ], "extra clones": [ 22772 ] } ] } 2.a ceph-objectstore-tool from osd.29 and osd.42 : ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-29/ --pgid 2.2ba --op list rbd_data.b4537a2ae8944a.000000000000425f ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":17248,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":20536,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22625,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22674,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22805,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":-2,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] 2.b ceph-objectstore-tool from osd.30 : ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-30/ --pgid 2.2ba --op list rbd_data.b4537a2ae8944a.000000000000425f ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":17248,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":20536,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22625,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22674,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22805,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":-2,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] I needed to shutdown the OSD service (30, 29 then 42) to be able to get any result. Otherwise i only had these errors : Mount failed with '(11) Resource temporarily unavailable' Or OSD has the store locked Without doing anything else, 2 OSDs start flapping (osd.38 and osd.27) with 1 PG switching between inactive, down and up… : HEALTH_ERR 2 osds down; 12128/37456062 objects misplaced (0.032%); 4 scrub errors; Reduced data availability: 1 pg inactive, 1 pg down; Possible data damage: 2 pgs inconsistent; Degraded data redundancy: 2264342/37456062 objects degraded (6.045%), 859 pgs degraded OSD_DOWN 2 osds down osd.27 (root=default,datacenter=IPR,room=11B,rack=baie2,host=r730xd3) is down osd.38 (root=default,datacenter=IPR,room=11B,rack=baie2,host=r740xd1) is down OBJECT_MISPLACED 12128/37456062 objects misplaced (0.032%) OSD_SCRUB_ERRORS 4 scrub errors PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg down pg 2.448 is down, acting [0] PG_DAMAGED Possible data damage: 2 pgs inconsistent pg 2.2ba is active+clean+inconsistent, acting [42,29,30] pg 2.2bb is active+clean+inconsistent, acting [25,42,18] pg 2.371 is active+undersized+degraded+remapped+inconsistent+backfill_wait,acting [42,9] … If i correctly understood the previous thread, i should remove the snapid 22772 from osd.29 and osd.42 : ceph-objectstore-tool --pgid 2.2ba --data-path /var/lib/ceph/osd/ceph-29/ ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] remove ceph-objectstore-tool --pgid 2.2ba --data-path /var/lib/ceph/osd/ceph-42/ ["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.000000000000425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}] remove Still need to shutdown the service before or i miss an important thing ? Sorry for the noob's noise, i not really confortable with the current state of my cluster -_- -- Gardais Jérémy Institut de Physique de Rennes Université Rennes 1 Téléphone: 02-23-23-68-60 Mail & bonnes pratiques: http://fr.wikipedia.org/wiki/Nétiquette ------------------------------- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com