Hi all,
hope someone can help me. After restarting a node of my 2-node-cluster suddenly I get this: root@yak2 /var/www/projects # ceph -s cluster: id: 749b2473-9300-4535-97a6-ee6d55008a1b health: HEALTH_WARN Reduced data availability: 200 pgs inactive services: mon: 3 daemons, quorum yak1,yak2,yak0 mgr: yak0.planwerk6.de(active), standbys: yak1.planwerk6.de, yak2.planwerk6.de mds: cephfs-1/1/1 up {0=yak1.planwerk6.de=up:active}, 1 up:standby osd: 2 osds: 2 up, 2 in data: pools: 2 pools, 200 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: 100.000% pgs unknown 200 unknown And this: root@yak2 /var/www/projects # ceph health detail HEALTH_WARN Reduced data availability: 200 pgs inactive PG_AVAILABILITY Reduced data availability: 200 pgs inactive pg 1.34 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.35 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.36 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.37 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.38 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.39 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.3a is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.3b is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.3c is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.3d is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.3e is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.3f is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.40 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.41 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.42 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.43 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.44 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.45 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.46 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.47 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.48 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.49 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.4a is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.4b is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.4c is stuck inactive for 3506.815664, current state unknown, last acting [] pg 1.4d is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.34 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.35 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.36 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.38 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.39 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.3a is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.3b is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.3c is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.3d is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.3e is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.3f is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.40 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.41 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.42 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.43 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.44 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.45 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.46 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.47 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.48 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.49 is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.4a is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.4b is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.4e is stuck inactive for 3506.815664, current state unknown, last acting [] pg 2.4f is stuck inactive for 3506.815664, current state unknown, last acting [] But if I query an individual PG I get this: root@yak1 /var/www/projects # ceph pg 1.49 query { "state": "active+clean", "snap_trimq": "[]", "snap_trimq_len": 0, "epoch": 162, "up": [ 0, 1 ], "acting": [ 0, 1 ], "acting_recovery_backfill": [ "0", "1" ], "info": { "pgid": "1.49", "last_update": "127'38077", "last_complete": "127'38077", "log_tail": "127'35000", "last_user_version": 38077, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": [], "history": { "epoch_created": 10, "epoch_pool_created": 10, "last_epoch_started": 159, "last_interval_started": 158, "last_epoch_clean": 159, "last_interval_clean": 158, "last_epoch_split": 0, "last_epoch_marked_full": 0, "same_up_since": 158, "same_interval_since": 158, "same_primary_since": 135, "last_scrub": "127'36909", "last_scrub_stamp": "2019-02-20 15:02:45.204342", "last_deep_scrub": "127'36714", "last_deep_scrub_stamp": "2019-02-16 07:55:15.205861", "last_clean_scrub_stamp": "2019-02-20 15:02:45.204342" }, "stats": { "version": "127'38077", "reported_seq": "58934", "reported_epoch": "162", "state": "active+clean", "last_fresh": "2019-02-20 19:56:56.740536", "last_change": "2019-02-20 19:52:27.063812", "last_active": "2019-02-20 19:56:56.740536", "last_peered": "2019-02-20 19:56:56.740536", "last_clean": "2019-02-20 19:56:56.740536", "last_became_active": "2019-02-20 19:52:27.062689", "last_became_peered": "2019-02-20 19:52:27.062689", "last_unstale": "2019-02-20 19:56:56.740536", "last_undegraded": "2019-02-20 19:56:56.740536", "last_fullsized": "2019-02-20 19:56:56.740536", "mapping_epoch": 158, "log_start": "127'35000", "ondisk_log_start": "127'35000", "created": 10, "last_epoch_clean": 159, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "127'36909", "last_scrub_stamp": "2019-02-20 15:02:45.204342", "last_deep_scrub": "127'36714", "last_deep_scrub_stamp": "2019-02-16 07:55:15.205861", "last_clean_scrub_stamp": "2019-02-20 15:02:45.204342", "log_size": 3077, "ondisk_log_size": 3077, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": true, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 478347970, "num_objects": 12052, "num_object_clones": 0, "num_object_copies": 24104, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 12052, "num_whiteouts": 0, "num_read": 20186, "num_read_kb": 1952018, "num_write": 38927, "num_write_kb": 484756, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 6, "num_bytes_recovered": 4101, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0 }, "up": [ 0, 1 ], "acting": [ 0, 1 ], "blocked_by": [], "up_primary": 0, "acting_primary": 0, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 159, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, "peer_info": [ { "peer": "1", "pgid": "1.49", "last_update": "127'38077", "last_complete": "127'38077", "log_tail": "127'35000", "last_user_version": 38077, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": [], "history": { "epoch_created": 10, "epoch_pool_created": 10, "last_epoch_started": 159, "last_interval_started": 158, "last_epoch_clean": 159, "last_interval_clean": 158, "last_epoch_split": 0, "last_epoch_marked_full": 0, "same_up_since": 158, "same_interval_since": 158, "same_primary_since": 135, "last_scrub": "127'36909", "last_scrub_stamp": "2019-02-20 15:02:45.204342", "last_deep_scrub": "127'36714", "last_deep_scrub_stamp": "2019-02-16 07:55:15.205861", "last_clean_scrub_stamp": "2019-02-20 15:02:45.204342" }, "stats": { "version": "127'38077", "reported_seq": "58745", "reported_epoch": "134", "state": "active+undersized+degraded", "last_fresh": "2019-02-20 19:06:19.180016", "last_change": "2019-02-20 19:04:39.483332", "last_active": "2019-02-20 19:06:19.180016", "last_peered": "2019-02-20 19:06:19.180016", "last_clean": "2019-02-20 18:23:33.675145", "last_became_active": "2019-02-20 19:04:39.483332", "last_became_peered": "2019-02-20 19:04:39.483332", "last_unstale": "2019-02-20 19:06:19.180016", "last_undegraded": "2019-02-20 19:04:39.477829", "last_fullsized": "2019-02-20 19:04:39.477717", "mapping_epoch": 158, "log_start": "127'35000", "ondisk_log_start": "127'35000", "created": 10, "last_epoch_clean": 124, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "127'36909", "last_scrub_stamp": "2019-02-20 15:02:45.204342", "last_deep_scrub": "127'36714", "last_deep_scrub_stamp": "2019-02-16 07:55:15.205861", "last_clean_scrub_stamp": "2019-02-20 15:02:45.204342", "log_size": 3077, "ondisk_log_size": 3077, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": true, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 478347970, "num_objects": 12052, "num_object_clones": 0, "num_object_copies": 24104, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 12052, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 12052, "num_whiteouts": 0, "num_read": 20186, "num_read_kb": 1952018, "num_write": 38927, "num_write_kb": 484756, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 6, "num_bytes_recovered": 4101, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0 }, "up": [ 0, 1 ], "acting": [ 0, 1 ], "blocked_by": [], "up_primary": 0, "acting_primary": 0, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 159, "hit_set_history": { "current_last_update": "0'0", "history": [] } } ], "recovery_state": [ { "name": "Started/Primary/Active", "enter_time": "2019-02-20 19:52:27.027151", "might_have_unfound": [], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "MIN", "backfill_info": { "begin": "MIN", "end": "MIN", "objects": [] }, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "pull_from_peer": [], "pushing": [] } }, "scrub": { "scrubber.epoch_start": "0", "scrubber.active": false, "scrubber.state": "INACTIVE", "scrubber.start": "MIN", "scrubber.end": "MIN", "scrubber.max_end": "MIN", "scrubber.subset_last_update": "0'0", "scrubber.deep": false, "scrubber.waiting_on_whom": [] } }, { "name": "Started", "enter_time": "2019-02-20 19:52:25.976144" } ], "agent_state": {} } I wonder what it all means and how to get out of this situation. The cluster seems to work normally. But it's quite disconcerting as you can probably imagine. Could it be a firewall issue? I'm not aware of any changes and I don't see any peering problems... Thank you Ranjan |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com