Re: Recover pgs from failed osds

Eugen Block <eblock@xxxxxx> · Fri, 28 Aug 2020 08:18:15 +0000

Just to confirm, each OSD node has 7 OSDs with 4 GB memory_target?  
That leaves only 4 GB RAM for the rest, and in case of heavy load the  
OSDs use even more. I would suggest to reduce the memory_target to 3  
GB and see if they start successfully.

Zitat von Vahideh Alinouri <vahideh.alinouri@xxxxxxxxx>:

osd_memory_target is 4294967296.
Cluster setup:
3 mon, 3 mgr, 21 osds on 3 ceph-osd nodes in lvm scenario.  ceph-osd nodes
resources are 32G RAM - 4 core CPU - osd disk 4TB - 9 osds have
block.wal on SSDs.  Public network is 1G and cluster network is 10G.
Cluster installed and upgraded using ceph-ansible.

On Thu, Aug 27, 2020 at 7:01 PM Eugen Block <eblock@xxxxxx> wrote:

What is the memory_target for your OSDs? Can you share more details
about your setup? You write about high memory, are the OSD nodes
affected by OOM killer? You could try to reduce the osd_memory_target
and see if that helps bring the OSDs back up. Splitting the PGs is a
very heavy operation.

Zitat von Vahideh Alinouri <vahideh.alinouri@xxxxxxxxx>:

> Ceph cluster is updated from nautilus to octopus. On ceph-osd nodes we
have
> high I/O wait.
>
> After increasing one of pool’s pg_num from 64 to 128 according to warning
> message (more objects per pg), this lead to high cpu load and ram usage
on
> ceph-osd nodes and finally crashed the whole cluster. Three osds, one on
> each host, stuck at down state (osd.34 osd.35 osd.40).
>
> Starting the down osd service causes high ram usage and cpu load and
> ceph-osd node to crash until the osd service fails.
>
> The active mgr service on each mon host will crash after consuming almost
> all available ram on the physical hosts.
>
> I need to recover pgs and solving corruption. How can i recover unknown
and
> down pgs? Is there any way to starting up failed osd?
>
>
> Below steps are done:
>
> 1- osd nodes’ kernel was upgraded to 5.4.2 before ceph cluster upgrading.
> Reverting to previous kernel 4.2.1 is tested for iowate decreasing, but
it
> had no effect.
>
> 2- Recovering 11 pgs on failed osds by export them using
> ceph-objectstore-tools utility and import them on other osds. The result
> followed: 9 pgs are “down” and 2 pgs are “unknown”.
>
> 2-1) 9 pgs export and import successfully but status is “down” because of
> "peering_blocked_by" 3 failed osds. I cannot lost osds because of
> preventing unknown pgs from getting lost. pgs size in K and M.
>
> "peering_blocked_by": [
>
> {
>
> "osd": 34,
>
> "current_lost_at": 0,
>
> "comment": "starting or marking this osd lost may let us proceed"
>
> },
>
> {
>
> "osd": 35,
>
> "current_lost_at": 0,
>
> "comment": "starting or marking this osd lost may let us proceed"
>
> },
>
> {
>
> "osd": 40,
>
> "current_lost_at": 0,
>
> "comment": "starting or marking this osd lost may let us proceed"
>
> }
>
> ]
>
>
> 2-2) 1 pg (2.39) export and import successfully, but after starting osd
> service (pg import to it), ceph-osd node RAM and CPU consumption increase
> and cause ceph-osd node to crash until the osd service fails. Other osds
> become "down" on ceph-osd node. pg status is “unknown”. I cannot use
> "force-create-pg" because of data lost. pg 2.39 size is 19G.
>
> # ceph pg map 2.39
>
> osdmap e40347 pg 2.39 (2.39) -> up [32,37] acting [32,37]
>
> # ceph pg 2.39 query
>
> Error ENOENT: i don't have pgid 2.39
>
>
> *pg 2.39 info on failed osd:
>
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/*ceph-34* --op info
> --pgid 2.39
>
> {
>
> "pgid": "2.39",
>
> "last_update": "35344'6456084",
>
> "last_complete": "35344'6456084",
>
> "log_tail": "35344'6453182",
>
> "last_user_version": 10595821,
>
> "last_backfill": "MAX",
>
> "purged_snaps": [],
>
> "history": {
>
> "epoch_created": 146,
>
> "epoch_pool_created": 79,
>
> "last_epoch_started": 25208,
>
> "last_interval_started": 25207,
>
> "last_epoch_clean": 25208,
>
> "last_interval_clean": 25207,
>
> "last_epoch_split": 370,
>
> "last_epoch_marked_full": 0,
>
> "same_up_since": 8347,
>
> "same_interval_since": 25207,
>
> "same_primary_since": 8321,
>
> "last_scrub": "35328'6440139",
>
> "last_scrub_stamp": "2020-08-19T12:00:59.377593+0430",
>
> "last_deep_scrub": "35261'6031075",
>
> "last_deep_scrub_stamp": "2020-08-17T01:59:26.606037+0430",
>
> "last_clean_scrub_stamp": "2020-08-19T12:00:59.377593+0430",
>
> "prior_readable_until_ub": 0
>
> },
>
> "stats": {
>
> "version": "35344'6456082",
>
> "reported_seq": "11733156",
>
> "reported_epoch": "35344",
>
> "state": "active+clean",
>
> "last_fresh": "2020-08-19T14:16:18.587435+0430",
>
> "last_change": "2020-08-19T12:00:59.377747+0430",
>
> "last_active": "2020-08-19T14:16:18.587435+0430",
>
> "last_peered": "2020-08-19T14:16:18.587435+0430",
>
> "last_clean": "2020-08-19T14:16:18.587435+0430",
>
> "last_became_active": "2020-08-06T00:23:51.016769+0430",
>
> "last_became_peered": "2020-08-06T00:23:51.016769+0430",
>
> "last_unstale": "2020-08-19T14:16:18.587435+0430",
>
> "last_undegraded": "2020-08-19T14:16:18.587435+0430",
>
> "last_fullsized": "2020-08-19T14:16:18.587435+0430",
>
> "mapping_epoch": 8347,
>
> "log_start": "35344'6453182",
>
> "ondisk_log_start": "35344'6453182",
>
> "created": 146,
>
> "last_epoch_clean": 25208,
>
> "parent": "0.0",
>
> "parent_split_bits": 7,
>
> "last_scrub": "35328'6440139",
>
> "last_scrub_stamp": "2020-08-19T12:00:59.377593+0430",
>
> "last_deep_scrub": "35261'6031075",
>
> "last_deep_scrub_stamp": "2020-08-17T01:59:26.606037+0430",
>
> "last_clean_scrub_stamp": "2020-08-19T12:00:59.377593+0430",
>
> "log_size": 2900,
>
> "ondisk_log_size": 2900,
>
> "stats_invalid": false,
>
> "dirty_stats_invalid": false,
>
> "omap_stats_invalid": false,
>
> "hitset_stats_invalid": false,
>
> "hitset_bytes_stats_invalid": false,
>
> "pin_stats_invalid": false,
>
> "manifest_stats_invalid": false,
>
> "snaptrimq_len": 0,
>
> "stat_sum": {
>
> "num_bytes": 19749578960,
>
> "num_objects": 2442,
>
> "num_object_clones": 20,
>
> "num_object_copies": 7326,
>
> "num_objects_missing_on_primary": 0,
>
> "num_objects_missing": 0,
>
> "num_objects_degraded": 0,
>
> "num_objects_misplaced": 0,
>
> "num_objects_unfound": 0,
>
> "num_objects_dirty": 2442,
>
> "num_whiteouts": 0,
>
> "num_read": 16120686,
>
> "num_read_kb": 82264126,
>
> "num_write": 19731882,
>
> "num_write_kb": 379030181,
>
> "num_scrub_errors": 0,
>
> "num_shallow_scrub_errors": 0,
>
> "num_deep_scrub_errors": 0,
>
> "num_objects_recovered": 2861,
>
> "num_bytes_recovered": 21673259070,
>
> "num_keys_recovered": 32,
>
> "num_objects_omap": 2,
>
> "num_objects_hit_set_archive": 0,
>
> "num_bytes_hit_set_archive": 0,
>
> "num_flush": 0,
>
> "num_flush_kb": 0,
>
> "num_evict": 0,
>
> "num_evict_kb": 0,
>
> "num_promote": 0,
>
> "num_flush_mode_high": 0,
>
> "num_flush_mode_low": 0,
>
> "num_evict_mode_some": 0,
>
> "num_evict_mode_full": 0,
>
> "num_objects_pinned": 0,
>
> "num_legacy_snapsets": 0,
>
> "num_large_omap_objects": 0,
>
> "num_objects_manifest": 0,
>
> "num_omap_bytes": 152,
>
> "num_omap_keys": 16,
>
> "num_objects_repaired": 0
>
> },
>
> "up": [
>
> 40,
>
> 35,
>
> 34
>
> ],
>
> "acting": [
>
> 40,
>
> 35,
>
> 34
>
> ],
>
> "avail_no_missing": [],
>
> "object_location_counts": [],
>
> "blocked_by": [],
>
> "up_primary": 40,
>
> "acting_primary": 40,
>
> "purged_snaps": []
>
> },
>
> "empty": 0,
>
> "dne": 0,
>
> "incomplete": 0,
>
> "last_epoch_started": 25208,
>
> "hit_set_history": {
>
> "current_last_update": "0'0",
>
> "history": []
>
> }
>
> }
>
>
> *pg 2.39 info on osd which import to it:
>
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/*ceph-37* --op info
> --pgid 2.39
>
> PG '2.39' not found
>
>
> 2-3) 1 pg (2.79) is lost! This pg is not found on any of three failed
osds
> (osd.34 osd.35 osd.40)! status is “unknown”. pg 2.79 export is failed: "
>  PG '2.79' not found"
>
>
>
> # ceph pg map 2.79
>
> Error ENOENT: i don't have pgid 2.79
>
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-34 --op info
> --pgid 2.79
>
> PG '2.79' not found
>
>
> 3- Using https://gitlab.lbader.de/kryptur/ceph-recovery/tree/master but
it
> does not work for recent ceph versions and tested on “hammer” release.
>
> 4- Using https://ceph.io/planet/recovering-from-a-complete-node-failure/
> but in lvm scenario I could not mount failed osd lv to new
> /var/lib/ceph/osd/ceph-x* .*Could not prepare and activate new osd to
> failed osd disk.
>
> 5- Setting pool min_size=1 that down pgs belong to it, restart osds that
> pgs import to them but no changes.
>
> 6- Seting pool min_size=1 that pg 2.39 belong to it, restart osds that pg
> import to them but no changes.
>
> 7- Repairing failed osds using ceph-objectstore-tools, making “in” and
> starting them but no changes.
>
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-x --op repair
>
>
> 8- Repairing 2 unknown pgs, but no changes.
>
> # ceph pg repaire 2.39
>
> # ceph pg repair 2.79
>
> 9- Forcing recovery 2 unknown pgs, but no changes.
>
> # ceph pg force-recovery 2.39
>
> # ceph pg force-recovery 2.79
>
> 10- Check PID count in ceph-osd nodes because of osd services failed to
> start.
>
> kernel.pid.max = 4194304
>
> 11- Raising osd_op_thread_suicide_timeout=900, but no change.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx