Re: pg stuck in remapped+peering for a long time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I still have the pgs stuck peering. I ran ceph pg n.nn query on a few of the pgs that are stuck. The ones that are just peering have a few entries in recovery_state -> past_intervals (Example at end of message) and the ones that say remapped+peering have a long entry here. I don't know what the content of pg query is but I have a ffeling that I have had writes to different nodes and that has messed up a few objects. I have a lot of network traffic between the nodes, a few hundred Mbps which would fit with osds trying to work out their state (9 disks with a random IO pattern would fit with the level of bandwidth i'm seeing).

This is the full output of ceph health detail

sudo ceph health detail
HEALTH_WARN 82 pgs peering; 82 pgs stuck inactive; 82 pgs stuck unclean; 1 requests are blocked > 32 sec; 1 osds have slow requests; pool images pg_num 256 > pgp_num 128
pg 3.21 is stuck inactive for 115937.161742, current state peering, last acting [7,5]
pg 3.80 is stuck inactive for 115913.708453, current state peering, last acting [8,6]
pg 3.23 is stuck inactive for 156640.618069, current state peering, last acting [8,3]
pg 3.82 is stuck inactive for 115931.967078, current state peering, last acting [1,5]
pg 3.e1 is stuck inactive for 116121.694227, current state peering, last acting [0,6]
pg 3.1c is stuck inactive for 115916.431120, current state peering, last acting [8,3]
pg 3.7e is stuck inactive for 115918.390949, current state peering, last acting [0,3]
pg 3.18 is stuck inactive for 115908.250832, current state peering, last acting [8,6]
pg 3.79 is stuck inactive for 115914.617676, current state peering, last acting [8,3]
pg 3.d8 is stuck inactive for 116341.813279, current state peering, last acting [2,6]
pg 3.1b is stuck inactive for 115905.061074, current state peering, last acting [7,4]
pg 3.d9 is stuck inactive for 156650.199216, current state peering, last acting [8,3]
pg 3.db is stuck inactive for 115915.924073, current state peering, last acting [1,5]
pg 3.d4 is stuck inactive for 115918.396086, current state peering, last acting [0,3]
pg 3.17 is stuck inactive for 115915.304764, current state peering, last acting [0,3]
pg 3.70 is stuck inactive for 115915.000395, current state peering, last acting [7,6]
pg 3.12 is stuck inactive for 115916.466955, current state peering, last acting [8,3]
pg 3.13 is stuck inactive for 244912.512309, current state remapped+peering, last acting [6,0]
pg 3.d2 is stuck inactive for 115913.708294, current state peering, last acting [8,3]
pg 3.6d is stuck inactive for 115909.860193, current state peering, last acting [8,4]
pg 3.6e is stuck inactive for 115914.617561, current state peering, last acting [8,3]
pg 3.9 is stuck inactive for 244908.745661, current state remapped+peering, last acting [4,2]
pg 3.68 is stuck inactive for 115916.701060, current state peering, last acting [7,3]
pg 3.6a is stuck inactive for 115914.617589, current state peering, last acting [8,3]
pg 3.4 is stuck inactive for 115913.708054, current state peering, last acting [8,3]
pg 3.ca is stuck inactive for 115915.923728, current state peering, last acting [0,6]
pg 3.64 is stuck inactive for 115905.061782, current state peering, last acting [7,4]
pg 3.6 is stuck inactive for 115913.708077, current state peering, last acting [8,3]
pg 3.0 is stuck inactive for 116106.189550, current state peering, last acting [8,6]
pg 3.c6 is stuck inactive for 115905.061588, current state peering, last acting [7,4]
pg 3.2 is stuck inactive for 116351.261968, current state peering, last acting [1,5]
pg 3.61 is stuck inactive for 115913.854102, current state peering, last acting [0,6]
pg 3.c0 is stuck inactive for 115916.700785, current state peering, last acting [7,3]
pg 3.c2 is stuck inactive for 115913.708368, current state peering, last acting [8,6]
pg 3.bd is stuck inactive for 115909.142185, current state peering, last acting [0,4]
pg 3.58 is stuck inactive for 116290.453805, current state peering, last acting [2,6]
pg 3.59 is stuck inactive for 156592.727428, current state peering, last acting [8,3]
pg 3.5b is stuck inactive for 115915.927480, current state peering, last acting [1,5]
pg 3.54 is stuck inactive for 115918.391135, current state peering, last acting [0,3]
pg 3.bb is stuck inactive for 115918.138327, current state peering, last acting [0,3]
pg 3.b5 is stuck inactive for 156609.811401, current state peering, last acting [7,3]
pg 3.52 is stuck inactive for 115914.617727, current state peering, last acting [8,3]
pg 3.b1 is stuck inactive for 115910.407513, current state peering, last acting [1,4]
pg 3.b3 is stuck inactive for 116204.050176, current state peering, last acting [0,6]
pg 3.af is stuck inactive for 115908.304844, current state peering, last acting [1,6]
pg 3.a8 is stuck inactive for 115909.753895, current state peering, last acting [8,5]
pg 3.4a is stuck inactive for 115913.854219, current state peering, last acting [0,6]
pg 3.a9 is stuck inactive for 115905.061347, current state peering, last acting [7,4]
pg 3.a4 is stuck inactive for 115909.753923, current state peering, last acting [8,4]
pg 3.46 is stuck inactive for 115905.061894, current state peering, last acting [7,4]
pg 3.40 is stuck inactive for 115916.701055, current state peering, last acting [7,3]
pg 3.a0 is stuck inactive for 156540.416593, current state peering, last acting [7,6]
pg 3.42 is stuck inactive for 116084.025651, current state peering, last acting [8,6]
pg 3.a1 is stuck inactive for 115905.061404, current state peering, last acting [7,5]
pg 3.a3 is stuck inactive for 156592.632676, current state peering, last acting [8,3]
pg 3.3d is stuck inactive for 115909.536349, current state peering, last acting [0,4]
pg 3.9c is stuck inactive for 115913.639973, current state peering, last acting [8,3]
pg 3.fe is stuck inactive for 115915.304682, current state peering, last acting [0,3]
pg 3.98 is stuck inactive for 115908.287692, current state peering, last acting [8,6]
pg 3.f9 is stuck inactive for 115913.708198, current state peering, last acting [8,3]
pg 3.3b is stuck inactive for 115915.304652, current state peering, last acting [0,3]
pg 3.9b is stuck inactive for 115905.061445, current state peering, last acting [7,4]
pg 3.35 is stuck inactive for 156760.780737, current state peering, last acting [7,3]
pg 3.97 is stuck inactive for 115913.854036, current state peering, last acting [0,3]
pg 3.31 is stuck inactive for 115910.565637, current state peering, last acting [1,4]
pg 3.f0 is stuck inactive for 115915.000192, current state peering, last acting [7,6]
pg 3.33 is stuck inactive for 115908.911398, current state peering, last acting [0,6]
pg 3.92 is stuck inactive for 115914.503597, current state peering, last acting [8,3]
pg 3.93 is stuck inactive for 244912.512404, current state remapped+peering, last acting [6,0]
pg 3.2f is stuck inactive for 115980.326105, current state peering, last acting [1,6]
pg 3.ed is stuck inactive for 115909.859689, current state peering, last acting [8,4]
pg 3.28 is stuck inactive for 115913.708757, current state peering, last acting [8,5]
pg 3.ee is stuck inactive for 115913.708285, current state peering, last acting [8,3]
pg 3.29 is stuck inactive for 115905.062092, current state peering, last acting [7,4]
pg 3.89 is stuck inactive for 244908.745759, current state remapped+peering, last acting [4,2]
pg 3.e8 is stuck inactive for 115916.700729, current state peering, last acting [7,3]
pg 3.24 is stuck inactive for 115909.860570, current state peering, last acting [8,4]
pg 3.ea is stuck inactive for 115913.708316, current state peering, last acting [8,3]
pg 3.84 is stuck inactive for 115913.708549, current state peering, last acting [8,3]
pg 3.e4 is stuck inactive for 115905.061352, current state peering, last acting [7,4]
pg 3.86 is stuck inactive for 115914.617720, current state peering, last acting [8,3]
pg 3.20 is stuck inactive for 156654.164647, current state peering, last acting [7,6]
pg 3.21 is stuck unclean for 115937.161932, current state peering, last acting [7,5]
pg 3.80 is stuck unclean for 115913.708641, current state peering, last acting [8,6]
pg 3.23 is stuck unclean for 156640.618257, current state peering, last acting [8,3]
pg 3.82 is stuck unclean for 115931.967266, current state peering, last acting [1,5]
pg 3.e1 is stuck unclean for 116121.694416, current state peering, last acting [0,6]
pg 3.1c is stuck unclean for 115916.431308, current state peering, last acting [8,3]
pg 3.7e is stuck unclean for 115918.391137, current state peering, last acting [0,3]
pg 3.18 is stuck unclean for 115908.251019, current state peering, last acting [8,6]
pg 3.79 is stuck unclean for 115914.617864, current state peering, last acting [8,3]
pg 3.d8 is stuck unclean for 116341.813466, current state peering, last acting [2,6]
pg 3.1b is stuck unclean for 115905.061262, current state peering, last acting [7,4]
pg 3.d9 is stuck unclean for 156650.199403, current state peering, last acting [8,3]
pg 3.db is stuck unclean for 115915.924260, current state peering, last acting [1,5]
pg 3.d4 is stuck unclean for 115918.396273, current state peering, last acting [0,3]
pg 3.17 is stuck unclean for 115915.304951, current state peering, last acting [0,3]
pg 3.70 is stuck unclean for 115915.000581, current state peering, last acting [7,6]
pg 3.12 is stuck unclean for 115916.467142, current state peering, last acting [8,3]
pg 3.13 is stuck unclean for 254650.057287, current state remapped+peering, last acting [6,0]
pg 3.d2 is stuck unclean for 115913.708481, current state peering, last acting [8,3]
pg 3.6d is stuck unclean for 115909.860380, current state peering, last acting [8,4]
pg 3.6e is stuck unclean for 115914.617747, current state peering, last acting [8,3]
pg 3.9 is stuck unclean for 255316.515662, current state remapped+peering, last acting [4,2]
pg 3.68 is stuck unclean for 115916.701246, current state peering, last acting [7,3]
pg 3.6a is stuck unclean for 115914.617775, current state peering, last acting [8,3]
pg 3.4 is stuck unclean for 115913.708241, current state peering, last acting [8,3]
pg 3.ca is stuck unclean for 115915.923915, current state peering, last acting [0,6]
pg 3.64 is stuck unclean for 115905.061969, current state peering, last acting [7,4]
pg 3.6 is stuck unclean for 115913.708264, current state peering, last acting [8,3]
pg 3.0 is stuck unclean for 116106.189737, current state peering, last acting [8,6]
pg 3.c6 is stuck unclean for 115905.061775, current state peering, last acting [7,4]
pg 3.2 is stuck unclean for 116351.262155, current state peering, last acting [1,5]
pg 3.61 is stuck unclean for 115913.854289, current state peering, last acting [0,6]
pg 3.c0 is stuck unclean for 115916.700973, current state peering, last acting [7,3]
pg 3.c2 is stuck unclean for 115913.708556, current state peering, last acting [8,6]
pg 3.bd is stuck unclean for 115909.142373, current state peering, last acting [0,4]
pg 3.58 is stuck unclean for 116290.453992, current state peering, last acting [2,6]
pg 3.59 is stuck unclean for 156592.727616, current state peering, last acting [8,3]
pg 3.5b is stuck unclean for 115915.927668, current state peering, last acting [1,5]
pg 3.54 is stuck unclean for 115918.391323, current state peering, last acting [0,3]
pg 3.bb is stuck unclean for 115918.138514, current state peering, last acting [0,3]
pg 3.b5 is stuck unclean for 156609.811589, current state peering, last acting [7,3]
pg 3.52 is stuck unclean for 115914.617914, current state peering, last acting [8,3]
pg 3.b1 is stuck unclean for 115910.407700, current state peering, last acting [1,4]
pg 3.b3 is stuck unclean for 116204.050364, current state peering, last acting [0,6]
pg 3.af is stuck unclean for 115908.305031, current state peering, last acting [1,6]
pg 3.a8 is stuck unclean for 115909.754082, current state peering, last acting [8,5]
pg 3.4a is stuck unclean for 115913.854406, current state peering, last acting [0,6]
pg 3.a9 is stuck unclean for 115905.061535, current state peering, last acting [7,4]
pg 3.a4 is stuck unclean for 115909.754111, current state peering, last acting [8,4]
pg 3.46 is stuck unclean for 115905.062087, current state peering, last acting [7,4]
pg 3.40 is stuck unclean for 115916.701248, current state peering, last acting [7,3]
pg 3.a0 is stuck unclean for 156540.416786, current state peering, last acting [7,6]
pg 3.42 is stuck unclean for 116084.025844, current state peering, last acting [8,6]
pg 3.a1 is stuck unclean for 115905.061597, current state peering, last acting [7,5]
pg 3.a3 is stuck unclean for 156592.632868, current state peering, last acting [8,3]
pg 3.3d is stuck unclean for 115909.536541, current state peering, last acting [0,4]
pg 3.9c is stuck unclean for 115913.640165, current state peering, last acting [8,3]
pg 3.fe is stuck unclean for 115915.304874, current state peering, last acting [0,3]
pg 3.98 is stuck unclean for 115908.287885, current state peering, last acting [8,6]
pg 3.f9 is stuck unclean for 115913.708390, current state peering, last acting [8,3]
pg 3.3b is stuck unclean for 115915.304844, current state peering, last acting [0,3]
pg 3.9b is stuck unclean for 115905.061638, current state peering, last acting [7,4]
pg 3.35 is stuck unclean for 156760.780929, current state peering, last acting [7,3]
pg 3.97 is stuck unclean for 115913.854229, current state peering, last acting [0,3]
pg 3.31 is stuck unclean for 115910.565829, current state peering, last acting [1,4]
pg 3.f0 is stuck unclean for 115915.000385, current state peering, last acting [7,6]
pg 3.33 is stuck unclean for 115908.911591, current state peering, last acting [0,6]
pg 3.92 is stuck unclean for 115914.503790, current state peering, last acting [8,3]
pg 3.93 is stuck unclean for 254650.057387, current state remapped+peering, last acting [6,0]
pg 3.2f is stuck unclean for 115980.326297, current state peering, last acting [1,6]
pg 3.ed is stuck unclean for 115909.859881, current state peering, last acting [8,4]
pg 3.28 is stuck unclean for 115913.708950, current state peering, last acting [8,5]
pg 3.ee is stuck unclean for 115913.708477, current state peering, last acting [8,3]
pg 3.29 is stuck unclean for 115905.062284, current state peering, last acting [7,4]
pg 3.89 is stuck unclean for 255316.515766, current state remapped+peering, last acting [4,2]
pg 3.e8 is stuck unclean for 115916.700921, current state peering, last acting [7,3]
pg 3.24 is stuck unclean for 115909.860762, current state peering, last acting [8,4]
pg 3.ea is stuck unclean for 115913.708507, current state peering, last acting [8,3]
pg 3.84 is stuck unclean for 115913.708741, current state peering, last acting [8,3]
pg 3.e4 is stuck unclean for 115905.061544, current state peering, last acting [7,4]
pg 3.86 is stuck unclean for 115914.617912, current state peering, last acting [8,3]
pg 3.20 is stuck unclean for 156654.164838, current state peering, last acting [7,6]
pg 3.ed is peering, acting [8,4]
pg 3.ee is peering, acting [8,3]
pg 3.e8 is peering, acting [7,3]
pg 3.ea is peering, acting [8,3]
pg 3.e4 is peering, acting [7,4]
pg 3.e1 is peering, acting [0,6]
pg 3.d8 is peering, acting [2,6]
pg 3.d9 is peering, acting [8,3]
pg 3.db is peering, acting [1,5]
pg 3.d4 is peering, acting [0,3]
pg 3.d2 is peering, acting [8,3]
pg 3.ca is peering, acting [0,6]
pg 3.c6 is peering, acting [7,4]
pg 3.c0 is peering, acting [7,3]
pg 3.c2 is peering, acting [8,6]
pg 3.bd is peering, acting [0,4]
pg 3.bb is peering, acting [0,3]
pg 3.b5 is peering, acting [7,3]
pg 3.b1 is peering, acting [1,4]
pg 3.b3 is peering, acting [0,6]
pg 3.af is peering, acting [1,6]
pg 3.a8 is peering, acting [8,5]
pg 3.a9 is peering, acting [7,4]
pg 3.a4 is peering, acting [8,4]
pg 3.a0 is peering, acting [7,6]
pg 3.a1 is peering, acting [7,5]
pg 3.a3 is peering, acting [8,3]
pg 3.9c is peering, acting [8,3]
pg 3.98 is peering, acting [8,6]
pg 3.9b is peering, acting [7,4]
pg 3.97 is peering, acting [0,3]
pg 3.92 is peering, acting [8,3]
pg 3.93 is remapped+peering, acting [6,0]
pg 3.89 is remapped+peering, acting [4,2]
pg 3.84 is peering, acting [8,3]
pg 3.86 is peering, acting [8,3]
pg 3.80 is peering, acting [8,6]
pg 3.82 is peering, acting [1,5]
pg 3.7e is peering, acting [0,3]
pg 3.79 is peering, acting [8,3]
pg 3.70 is peering, acting [7,6]
pg 3.6d is peering, acting [8,4]
pg 3.6e is peering, acting [8,3]
pg 3.68 is peering, acting [7,3]
pg 3.6a is peering, acting [8,3]
pg 3.64 is peering, acting [7,4]
pg 3.61 is peering, acting [0,6]
pg 3.58 is peering, acting [2,6]
pg 3.59 is peering, acting [8,3]
pg 3.5b is peering, acting [1,5]
pg 3.54 is peering, acting [0,3]
pg 3.52 is peering, acting [8,3]
pg 3.4a is peering, acting [0,6]
pg 3.46 is peering, acting [7,4]
pg 3.40 is peering, acting [7,3]
pg 3.42 is peering, acting [8,6]
pg 3.3d is peering, acting [0,4]
pg 3.3b is peering, acting [0,3]
pg 3.35 is peering, acting [7,3]
pg 3.31 is peering, acting [1,4]
pg 3.33 is peering, acting [0,6]
pg 3.2f is peering, acting [1,6]
pg 3.28 is peering, acting [8,5]
pg 3.29 is peering, acting [7,4]
pg 3.24 is peering, acting [8,4]
pg 3.20 is peering, acting [7,6]
pg 3.21 is peering, acting [7,5]
pg 3.23 is peering, acting [8,3]
pg 3.1c is peering, acting [8,3]
pg 3.18 is peering, acting [8,6]
pg 3.1b is peering, acting [7,4]
pg 3.17 is peering, acting [0,3]
pg 3.12 is peering, acting [8,3]
pg 3.13 is remapped+peering, acting [6,0]
pg 3.9 is remapped+peering, acting [4,2]
pg 3.4 is peering, acting [8,3]
pg 3.6 is peering, acting [8,3]
pg 3.0 is peering, acting [8,6]
pg 3.2 is peering, acting [1,5]
pg 3.fe is peering, acting [0,3]
pg 3.f9 is peering, acting [8,3]
pg 3.f0 is peering, acting [7,6]
1 ops are blocked > 134218 sec
1 ops are blocked > 134218 sec on osd.8
1 osds have slow requests
pool images pg_num 256 > pgp_num 128



and this is the output of ceph pg 3.a9 query (the stats section looks to be important. The number of bytes recovered is significantly larger than the size of the pg)
{
    "state": "peering",
    "snap_trimq": "[]",
    "epoch": 211256,
    "up": [
        7,
        4
    ],
    "acting": [
        7,
        4
    ],
    "info": {
        "pgid": "3.a9",
        "last_update": "3359'110581",
        "last_complete": "3359'110581",
        "log_tail": "850'107578",
        "last_user_version": 110581,
        "last_backfill": "MAX",
        "purged_snaps": "[]",
        "history": {
            "epoch_created": 31,
            "last_epoch_started": 116841,
            "last_epoch_clean": 116844,
            "last_epoch_split": 0,
            "same_up_since": 116838,
            "same_interval_since": 126562,
            "same_primary_since": 1202,
            "last_scrub": "3359'110581",
            "last_scrub_stamp": "2015-11-13 13:22:55.682647",
            "last_deep_scrub": "987'109658",
            "last_deep_scrub_stamp": "2015-11-09 13:56:36.850047",
            "last_clean_scrub_stamp": "2015-11-13 13:22:55.682647"
        },
        "stats": {
            "version": "3359'110581",
            "reported_seq": "103843",
            "reported_epoch": "211192",
            "state": "peering",
            "last_fresh": "2015-11-15 17:45:30.009129",
            "last_change": "2015-11-14 11:25:20.451898",
            "last_active": "2015-11-14 09:35:03.312840",
            "last_peered": "2015-11-14 09:35:03.312840",
            "last_clean": "2015-11-14 09:35:03.312840",
            "last_became_active": "0.000000",
            "last_became_peered": "0.000000",
            "last_unstale": "2015-11-15 17:45:30.009129",
            "last_undegraded": "2015-11-15 17:45:30.009129",
            "last_fullsized": "2015-11-15 17:45:30.009129",
            "mapping_epoch": 94611,
            "log_start": "850'107578",
            "ondisk_log_start": "850'107578",
            "created": 31,
            "last_epoch_clean": 116844,
            "parent": "0.0",
            "parent_split_bits": 0,
            "last_scrub": "3359'110581",
            "last_scrub_stamp": "2015-11-13 13:22:55.682647",
            "last_deep_scrub": "987'109658",
            "last_deep_scrub_stamp": "2015-11-09 13:56:36.850047",
            "last_clean_scrub_stamp": "2015-11-13 13:22:55.682647",
            "log_size": 3003,
            "ondisk_log_size": 3003,
            "stats_invalid": "1",
            "stat_sum": {
                "num_bytes": 18268690441,
                "num_objects": 4402,
                "num_object_clones": 0,
                "num_object_copies": 8804,
                "num_objects_missing_on_primary": 0,
                "num_objects_degraded": 0,
                "num_objects_misplaced": 0,
                "num_objects_unfound": 0,
                "num_objects_dirty": 4402,
                "num_whiteouts": 0,
                "num_read": 2268,
                "num_read_kb": 31055,
                "num_write": 8111,
                "num_write_kb": 1762444,
                "num_scrub_errors": 0,
                "num_shallow_scrub_errors": 0,
                "num_deep_scrub_errors": 0,
                "num_objects_recovered": 13228,
                "num_bytes_recovered": 54922698769,
                "num_keys_recovered": 0,
                "num_objects_omap": 0,
                "num_objects_hit_set_archive": 0,
                "num_bytes_hit_set_archive": 0
            },
            "up": [
                7,
                4
            ],
            "acting": [
                7,
                4
            ],
            "blocked_by": [
                4
            ],
            "up_primary": 7,
            "acting_primary": 7
        },
        "empty": 0,
        "dne": 0,
        "incomplete": 0,
        "last_epoch_started": 116841,
        "hit_set_history": {
            "current_last_update": "0'0",
            "current_last_stamp": "0.000000",
            "current_info": {
                "begin": "0.000000",
                "end": "0.000000",
                "version": "0'0"
            },
            "history": []
        }
    },
    "peer_info": [],
    "recovery_state": [
        {
            "name": "Started\/Primary\/Peering\/GetInfo",
            "enter_time": "2015-11-14 11:25:20.451888",
            "requested_info_from": [
                {
                    "osd": "4"
                }
            ]
        },
        {
            "name": "Started\/Primary\/Peering",
            "enter_time": "2015-11-14 11:25:20.451882",
            "past_intervals": [
                {
                    "first": 116838,
                    "last": 120813,
                    "maybe_went_rw": 1,
                    "up": [
                        7,
                        4
                    ],
                    "acting": [
                        7,
                        4
                    ],
                    "primary": 7,
                    "up_primary": 7
                },
                {
                    "first": 120814,
                    "last": 120889,
                    "maybe_went_rw": 1,
                    "up": [
                        7,
                        4
                    ],
                    "acting": [
                        7,
                        4
                    ],
                    "primary": 7,
                    "up_primary": 7
                },
                {
                    "first": 120890,
                    "last": 126561,
                    "maybe_went_rw": 1,
                    "up": [
                        7,
                        4
                    ],
                    "acting": [
                        7,
                        4
                    ],
                    "primary": 7,
                    "up_primary": 7
                }
            ],
            "probing_osds": [
                "4",
                "7"
            ],
            "down_osds_we_would_probe": [],
            "peering_blocked_by": []
        },
        {
            "name": "Started",
            "enter_time": "2015-11-14 11:25:20.451851"
        }
    ],
    "agent_state": {}
}


Regards
Pete

On 15 November 2015 at 01:26, Peter Theobald <pete@xxxxxxxxxxxxxxx> wrote:
Hi Gregory,
This is the output of ceph -s
    cluster 5400bbc9-378d-4c69-afc4-da71393f7baf
     health HEALTH_WARN
            82 pgs peering
            82 pgs stuck inactive
            82 pgs stuck unclean
            1 requests are blocked > 32 sec
            pool images pg_num 256 > pgp_num 128
     monmap e2: 2 mons at {0=192.168.2.1:6789/0,1=192.168.2.3:6789/0}
            election epoch 16, quorum 0,1 0,1
     osdmap e168004: 9 osds: 9 up, 9 in; 4 remapped pgs
      pgmap v1317963: 256 pgs, 1 pools, 4377 GB data, 1105 kobjects
            8792 GB used, 15369 GB / 24162 GB avail
                 174 active+clean
                  78 peering
                   4 remapped+peering


Total available space is about 24TB. Used space is 8TB at replication level of 2,

Regards
Pete

On 14 November 2015 at 18:03, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
What's the full output of "Ceph -s"? Are your new crush rules actually satisfiable? Is your cluster filling up?
-Greg


On Saturday, November 14, 2015, Peter Theobald <pete@xxxxxxxxxxxxxxx> wrote:
Hi list,

I have a 3 node ceph cluster with a total of 9 ods (2,3 and 4 with different size drives). I changed the layout (failure domain from per osd to per host and changed min_size) and I now have a few pgs stuck in peering or remapped+peering for a couple of day now.

The hosts are under powered. 2x hp microservers and a single i5 desktop grade machine so not super powerful. The network is fast though (bonded gb ethernet with dedicated switch).

I'm concerned that the remapped+peering pgs are stuck. All the nodes in peering or remapped+peering are stuck inactive and unclean so i'm concerned about data loss. Do I just need to wait for them to fix themselves? I cannot see any mention of unfound objects when I query the remapped pgs so I think i'm ok and just need to be patient. I have 128 pgs across 9 osds so probably have a lot of objects per pg. Total data is about 4TB

Regards

Pete


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux