Hi,
can you clarify what exactly you did to get into this situation? What
about the undersized PGs, any chance to bring those OSDs back online?
Regarding the incomplete PGs I'm not sure there's much you can do if
the OSDs are lost. To me it reads like you may have
destroyed/recreated more OSDs than you should have, just recreating
OSDs with the same IDs is not sufficient if you destroyed too many
chunks. Each OSD only contains a chunk of the PG due to the erasure
coding. I'm afraid those objects are lost and you would have to
restore from backup. To get the cluster into a healthy state again
there a couple of threads, e. g. [1], but recovering the lost chunks
from ceph will probably not work.
Regards,
Eugen
[1] https://www.mail-archive.com/ceph-users@xxxxxxx/msg14757.html
Zitat von Deep Dish <deeepdish@xxxxxxxxx>:
Hello. I really screwed up my ceph cluster. Hoping to get data off it
so I can rebuild it.
In summary, too many changes too quickly caused the cluster to develop
incomplete pgs. Some PGS were reporting that OSDs were to be probes.
I've created those OSD IDs (empty), however this wouldn't clear
incompletes. Incompletes are part of EC pools. Running 17.2.5.
This is the overall state:
cluster:
id: 49057622-69fc-11ed-b46e-d5acdedaae33
health: HEALTH_WARN
Failed to apply 1 service(s): osd.dashboard-admin-1669078094056
1 hosts fail cephadm check
cephadm background work is paused
Reduced data availability: 28 pgs inactive, 28 pgs incomplete
Degraded data redundancy: 55 pgs undersized
2 slow ops, oldest one blocked for 4449 sec, daemons
[osd.25,osd.50,osd.51] have slow ops.
These are PGs that are incomplete that HAVE DATA (Objects > 0) [ via ceph
pg ls incomplete ]:
2.35 23199 0 0 0 95980273664 0
0 2477 incomplete 10s 2104'46277 28260:686871
[44,4,37,3,40,32]p44 [44,4,37,3,40,32]p44
2023-01-03T03:54:47.821280+0000 2022-12-29T18:53:09.287203+0000
14 queued for deep scrub
2.53 22821 0 0 0 94401175552 0
0 2745 remapped+incomplete 10s 2104'45845 28260:565267
[60,48,52,65,67,7]p60 [60]p60
2023-01-03T10:18:13.388383+0000 2023-01-03T10:18:13.388383+0000
408 queued for scrub
2.9f 22858 0 0 0 94555983872 0
0 2736 remapped+incomplete 10s 2104'45636 28260:759872
[56,59,3,57,5,32]p56 [56]p56
2023-01-03T10:55:49.848693+0000 2023-01-03T10:55:49.848693+0000
376 queued for scrub
2.be 22870 0 0 0 94429110272 0
0 2661 remapped+incomplete 10s 2104'45561 28260:813759
[41,31,37,9,7,69]p41 [41]p41
2023-01-03T14:02:15.790077+0000 2023-01-03T14:02:15.790077+0000
360 queued for scrub
2.e4 22953 0 0 0 94912278528 0
0 2648 remapped+incomplete 20m 2104'46048 28259:732896
[37,46,33,4,48,49]p37 [37]p37
2023-01-02T18:38:46.268723+0000 2022-12-29T18:05:47.431468+0000
18 queued for deep scrub
17.78 20169 0 0 0 84517834400 0
0 2198 remapped+incomplete 10s 3735'53405 28260:1243673
[4,37,2,36,66,0]p4 [41]p41
2023-01-03T14:21:41.563424+0000 2023-01-03T14:21:41.563424+0000
348 queued for scrub
17.d8 20328 0 0 0 85196053130 0
0 1852 remapped+incomplete 10s 3735'54458 28260:1309564
[38,65,61,37,58,39]p38 [53]p53
2023-01-02T18:32:35.371071+0000 2022-12-28T19:08:29.492244+0000
21 queued for deep scrub
At present I'm unable to reliably access my data due to incomplete pages
above. I'll post whatever outputs requested (won't post now as it can be
rather verbose). Is there hope?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx