Re: clust recovery stuck

Eugen Block <eblock@xxxxxx> · Wed, 23 Oct 2019 07:19:55 +0000

Hi,

if the OSDs are not too full it's probably the crush weight of those  
hosts and OSDs. CRUSH tries to distribute the data evenly to all three  
hosts because they have the same weight (9.31400). But since two OSDs  
are missing the distribution doesn't finish. If you can't replace the  
failed OSDs you could try to adjust the crush weights and see if the  
recovery finishes.

Regards,
Eugen

Zitat von Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx>:

Hi Philipp,

Given 256 PG's triple replicated onto 4 OSD's you might be  
encountering the "PG overdose protection" of OSDs.  Take a look at  
'ceph osd df' and see the number of PG's that are mapped to each OSD  
(last column or near the last).  The default limit is 200, so if any  
OSD exceeds that, it would explain the freeze, since the OSD will  
simply ignore the excess.  In that case, try increasing  
mon_max_pg_per_osd to, say, 400 and see if that helps.  This would  
allow the recovery to proceed - but you should consider adding OSDs  
(or at least increase the memory allocated to OSDs above the  
defaults).

Andras

On 10/22/19 3:02 PM, Philipp Schwaha wrote:
hi,

On 2019-10-22 08:05, Eugen Block wrote:
Hi,

can you share `ceph osd tree`? What crush rules are in use in your
cluster? I assume that the two failed OSDs prevent the remapping because
the rules can't be applied.

ceph osd tree gives:

ID WEIGHT   TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 27.94199 root default
-2  9.31400     host alpha.local
 0  4.65700         osd.0           down        0          1.00000
 3  4.65700         osd.3             up  1.00000          1.00000
-3  9.31400     host beta.local
 1  4.65700         osd.1             up  1.00000          1.00000
 6  4.65700         osd.6           down        0          1.00000
-4  9.31400     host gamma.local
 2  4.65700         osd.2             up  1.00000          1.00000
 4  4.65700         osd.4             up  1.00000          1.00000

the crush rules should be fairly simple, nothing particularly customized
as far as I can tell:
'ceph osd crush tree' gives:
[
    {
        "id": -1,
        "name": "default",
        "type": "root",
        "type_id": 10,
        "items": [
            {
                "id": -2,
                "name": "alpha.local",
                "type": "host",
                "type_id": 1,
                "items": [
                    {
                        "id": 0,
                        "name": "osd.0",
                        "type": "osd",
                        "type_id": 0,
                        "crush_weight": 4.656998,
                        "depth": 2
                    },
                    {
                        "id": 3,
                        "name": "osd.3",
                        "type": "osd",
                        "type_id": 0,
                        "crush_weight": 4.656998,
                        "depth": 2
                    }
                ]
            },
            {
                "id": -3,
                "name": "beta.local",
                "type": "host",
                "type_id": 1,
                "items": [
                    {
                        "id": 1,
                        "name": "osd.1",
                        "type": "osd",
                        "type_id": 0,
                        "crush_weight": 4.656998,
                        "depth": 2
                    },
                    {
                        "id": 6,
                        "name": "osd.6",
                        "type": "osd",
                        "type_id": 0,
                        "crush_weight": 4.656998,
                        "depth": 2
                    }
                ]
            },
            {
                "id": -4,
                "name": "gamma.local",
                "type": "host",
                "type_id": 1,
                "items": [
                    {
                        "id": 2,
                        "name": "osd.2",
                        "type": "osd",
                        "type_id": 0,
                        "crush_weight": 4.656998,
                        "depth": 2
                    },
                    {
                        "id": 4,
                        "name": "osd.4",
                        "type": "osd",
                        "type_id": 0,
                        "crush_weight": 4.656998,
                        "depth": 2
                    }
                ]
            }
        ]
    }
]

and 'ceph osd crush rule dump' gives:
[
    {
        "rule_id": 0,
        "rule_name": "replicated_ruleset",
        "ruleset": 0,
        "type": 1,
        "min_size": 1,
        "max_size": 10,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]

the cluster actually reached health ok after osd.0 went down, but when
osd.6 went down it did not recover. the cluster is running ceph version
10.2.2.

any help is greatly appreciated!

thanks & cheers
	Philipp

Zitat von Philipp Schwaha <philipp@xxxxxxxxxxx>:

hi,

I have a problem with a cluster being stuck in recovery after osd
failure. at first recovery was doing quite well, but now it just sits
there without any progress. I currently looks like this:

     health HEALTH_ERR
            36 pgs are stuck inactive for more than 300 seconds
            50 pgs backfill_wait
            52 pgs degraded
            36 pgs down
            36 pgs peering
            1 pgs recovering
            1 pgs recovery_wait
            36 pgs stuck inactive
            52 pgs stuck unclean
            52 pgs undersized
            recovery 261632/2235446 objects degraded (11.704%)
            recovery 259813/2235446 objects misplaced (11.622%)
            recovery 2/1117723 unfound (0.000%)
     monmap e3: 3 mons at
{0=192.168.19.13:6789/0,1=192.168.19.17:6789/0,2=192.168.19.23:6789/0}
            election epoch 78, quorum 0,1,2 0,1,2
     osdmap e7430: 6 osds: 4 up, 4 in; 88 remapped pgs
            flags sortbitwise
      pgmap v20023893: 256 pgs, 1 pools, 4366 GB data, 1091 kobjects
            8421 GB used, 10183 GB / 18629 GB avail
            261632/2235446 objects degraded (11.704%)
            259813/2235446 objects misplaced (11.622%)
            2/1117723 unfound (0.000%)
                 168 active+clean
                  50 active+undersized+degraded+remapped+wait_backfill
                  36 down+remapped+peering
                   1 active+recovering+undersized+degraded+remapped
                   1 active+recovery_wait+undersized+degraded+remapped

Is there any way to motivate it to resume recovery?

Thanks
    Philipp

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com