Re: New cluster in unhealthy state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

 

Nothing sticks out to me as being the cause of the problem. If you restart one of the OSD’s is there anything obvious in the logs?

 

Apart from that, I’m out of ideas I’m afraid.

 

Nick

 

From: Dave Durkee [mailto:dave@xxxxxxx]
Sent: 26 June 2015 17:10
To: Nick Fisk; ceph-users@xxxxxxxxxxxxxx
Subject: RE: New cluster in unhealthy state

 

Nick I rebuilt the cluster using the following commands: 

 

ceph-deploy purge admin mon osd1 osd2 osd3

ceph-deploy purgedata admin mon osd1 osd2 osd3

ceph-deploy forgetkeys

rm -f ceph.bootstrap-rgw.keyring ceph.log ceph.conf

 

ceph-deploy new mon

cat ceph.conf.add >> ceph.conf

ceph-deploy install admin mon osd1 osd2 osd3

ceph-deploy mon create-initial

ceph-deploy disk zap osd1:sdc osd1:sdd osd1:sde

ceph-deploy osd create osd1:sdc:/journal/c osd1:sdd:/journal/d osd1:sde:/journal/e

ceph-deploy admin admin mon osd1 osd2 osd3

chmod +r /etc/ceph/ceph.client.admin.keyring

ceph health

 

I received no errors during the above process.

 

Here is a copy of the ceph.conf

[global]

auth_service_required = cephx

filestore_xattr_use_omap = true

auth_client_required = cephx

auth_cluster_required = cephx

mon_host = 172.17.1.16

mon_initial_members = mon

fsid = f070bdc0-ccff-4d1d-bb3e-071d695ed629

 

osd pool default size = 2

public network = 172.17.1.0/24

cluster network = 10.0.0.0/24

 

The ceph health detail produces the following output:

HEALTH_WARN 24 pgs degraded; 24 pgs stuck degraded; 64 pgs stuck unclean; 24 pgs stuck undersized; 24 pgs undersized; too few PGs per OSD (21 < min 30)

pg 0.22 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.21 is stuck unclean since forever, current state active+remapped, last acting [2,0]

pg 0.20 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.1f is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.1e is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.1d is stuck unclean since forever, current state active+remapped, last acting [2,0]

pg 0.1c is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.1b is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.1a is stuck unclean since forever, current state active+remapped, last acting [2,0]

pg 0.19 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.18 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.17 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.16 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.15 is stuck unclean since forever, current state active, last acting [2,1]

pg 0.14 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.13 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.12 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.11 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.10 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.f is stuck unclean since forever, current state active, last acting [2,1]

pg 0.e is stuck unclean since forever, current state active, last acting [2,1]

pg 0.d is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.c is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.b is stuck unclean since forever, current state active, last acting [2,1]

pg 0.a is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.9 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.8 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.7 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.6 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.5 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.4 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.3 is stuck unclean since forever, current state active+remapped, last acting [2,0]

pg 0.2 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.1 is stuck unclean since forever, current state active, last acting [2,1]

pg 0.0 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.3f is stuck unclean since forever, current state active+remapped, last acting [2,0]

pg 0.3e is stuck unclean since forever, current state active+remapped, last acting [2,0]

pg 0.3d is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.3c is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.3b is stuck unclean since forever, current state active, last acting [2,1]

pg 0.3a is stuck unclean since forever, current state active, last acting [2,1]

pg 0.39 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.38 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.37 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.36 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.35 is stuck unclean since forever, current state active, last acting [2,1]

pg 0.34 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.33 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.32 is stuck unclean since forever, current state active, last acting [2,1]

pg 0.31 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.30 is stuck unclean since forever, current state active, last acting [2,1]

pg 0.2f is stuck unclean since forever, current state active, last acting [2,1]

pg 0.2e is stuck unclean since forever, current state active, last acting [2,1]

pg 0.2d is stuck unclean since forever, current state active+remapped, last acting [2,0]

pg 0.2c is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.2b is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.2a is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.29 is stuck unclean since forever, current state active+undersized+degraded, last acting [0]

pg 0.28 is stuck unclean since forever, current state active, last acting [2,1]

pg 0.27 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.26 is stuck unclean since forever, current state active+remapped, last acting [1,0]

pg 0.25 is stuck unclean since forever, current state active, last acting [2,1]

pg 0.24 is stuck unclean since forever, current state active, last acting [2,1]

pg 0.23 is stuck unclean since forever, current state active+remapped, last acting [2,0]

pg 0.1f is stuck undersized for 1650.474153, current state active+undersized+degraded, last acting [0]

pg 0.1e is stuck undersized for 1650.488904, current state active+undersized+degraded, last acting [0]

pg 0.1c is stuck undersized for 1650.489953, current state active+undersized+degraded, last acting [0]

pg 0.19 is stuck undersized for 1650.491760, current state active+undersized+degraded, last acting [0]

pg 0.17 is stuck undersized for 1650.492908, current state active+undersized+degraded, last acting [0]

pg 0.16 is stuck undersized for 1650.493515, current state active+undersized+degraded, last acting [0]

pg 0.11 is stuck undersized for 1650.496410, current state active+undersized+degraded, last acting [0]

pg 0.10 is stuck undersized for 1650.497174, current state active+undersized+degraded, last acting [0]

pg 0.c is stuck undersized for 1650.499547, current state active+undersized+degraded, last acting [0]

pg 0.a is stuck undersized for 1650.500749, current state active+undersized+degraded, last acting [0]

pg 0.6 is stuck undersized for 1650.503065, current state active+undersized+degraded, last acting [0]

pg 0.5 is stuck undersized for 1650.503638, current state active+undersized+degraded, last acting [0]

pg 0.4 is stuck undersized for 1650.504332, current state active+undersized+degraded, last acting [0]

pg 0.2 is stuck undersized for 1650.517230, current state active+undersized+degraded, last acting [0]

pg 0.0 is stuck undersized for 1650.518257, current state active+undersized+degraded, last acting [0]

pg 0.3c is stuck undersized for 1649.521029, current state active+undersized+degraded, last acting [0]

pg 0.39 is stuck undersized for 1649.704285, current state active+undersized+degraded, last acting [0]

pg 0.38 is stuck undersized for 1649.896024, current state active+undersized+degraded, last acting [0]

pg 0.36 is stuck undersized for 1650.037711, current state active+undersized+degraded, last acting [0]

pg 0.33 is stuck undersized for 1650.212724, current state active+undersized+degraded, last acting [0]

pg 0.2c is stuck undersized for 1650.466254, current state active+undersized+degraded, last acting [0]

pg 0.2b is stuck undersized for 1650.467129, current state active+undersized+degraded, last acting [0]

pg 0.2a is stuck undersized for 1650.467775, current state active+undersized+degraded, last acting [0]

pg 0.29 is stuck undersized for 1650.468388, current state active+undersized+degraded, last acting [0]

pg 0.1f is stuck degraded for 1650.474301, current state active+undersized+degraded, last acting [0]

pg 0.1e is stuck degraded for 1650.489051, current state active+undersized+degraded, last acting [0]

pg 0.1c is stuck degraded for 1650.490100, current state active+undersized+degraded, last acting [0]

pg 0.19 is stuck degraded for 1650.491908, current state active+undersized+degraded, last acting [0]

pg 0.17 is stuck degraded for 1650.493055, current state active+undersized+degraded, last acting [0]

pg 0.16 is stuck degraded for 1650.493662, current state active+undersized+degraded, last acting [0]

pg 0.11 is stuck degraded for 1650.496557, current state active+undersized+degraded, last acting [0]

pg 0.10 is stuck degraded for 1650.497321, current state active+undersized+degraded, last acting [0]

pg 0.c is stuck degraded for 1650.499694, current state active+undersized+degraded, last acting [0]

pg 0.a is stuck degraded for 1650.500897, current state active+undersized+degraded, last acting [0]

pg 0.6 is stuck degraded for 1650.503213, current state active+undersized+degraded, last acting [0]

pg 0.5 is stuck degraded for 1650.503786, current state active+undersized+degraded, last acting [0]

pg 0.4 is stuck degraded for 1650.504480, current state active+undersized+degraded, last acting [0]

pg 0.2 is stuck degraded for 1650.517378, current state active+undersized+degraded, last acting [0]

pg 0.0 is stuck degraded for 1650.518404, current state active+undersized+degraded, last acting [0]

pg 0.3c is stuck degraded for 1649.521177, current state active+undersized+degraded, last acting [0]

pg 0.39 is stuck degraded for 1649.704432, current state active+undersized+degraded, last acting [0]

pg 0.38 is stuck degraded for 1649.896170, current state active+undersized+degraded, last acting [0]

pg 0.36 is stuck degraded for 1650.037859, current state active+undersized+degraded, last acting [0]

pg 0.33 is stuck degraded for 1650.212872, current state active+undersized+degraded, last acting [0]

pg 0.2c is stuck degraded for 1650.466402, current state active+undersized+degraded, last acting [0]

pg 0.2b is stuck degraded for 1650.467276, current state active+undersized+degraded, last acting [0]

pg 0.2a is stuck degraded for 1650.467922, current state active+undersized+degraded, last acting [0]

pg 0.29 is stuck degraded for 1650.468535, current state active+undersized+degraded, last acting [0]

pg 0.1f is active+undersized+degraded, acting [0]

pg 0.1e is active+undersized+degraded, acting [0]

pg 0.1c is active+undersized+degraded, acting [0]

pg 0.19 is active+undersized+degraded, acting [0]

pg 0.17 is active+undersized+degraded, acting [0]

pg 0.16 is active+undersized+degraded, acting [0]

pg 0.11 is active+undersized+degraded, acting [0]

pg 0.10 is active+undersized+degraded, acting [0]

pg 0.c is active+undersized+degraded, acting [0]

pg 0.a is active+undersized+degraded, acting [0]

pg 0.6 is active+undersized+degraded, acting [0]

pg 0.5 is active+undersized+degraded, acting [0]

pg 0.4 is active+undersized+degraded, acting [0]

pg 0.2 is active+undersized+degraded, acting [0]

pg 0.0 is active+undersized+degraded, acting [0]

pg 0.3c is active+undersized+degraded, acting [0]

pg 0.39 is active+undersized+degraded, acting [0]

pg 0.38 is active+undersized+degraded, acting [0]

pg 0.36 is active+undersized+degraded, acting [0]

pg 0.33 is active+undersized+degraded, acting [0]

pg 0.2c is active+undersized+degraded, acting [0]

pg 0.2b is active+undersized+degraded, acting [0]

pg 0.2a is active+undersized+degraded, acting [0]

pg 0.29 is active+undersized+degraded, acting [0]

too few PGs per OSD (21 < min 30)

 

Each OSD was 1 disk of 500GB and a file system journal on another disk.  I verified the network on all of the hosts and all is well.  I am not using jumbo frames yet as I want to get everything working with stock GB networking.  The Mon, and the OSD hosts have 2 nics separated by vlan tagging.  I have configured a public network and a cluster network.  The public network is 172.17.1/24 and the cluster network is 10/24.

 

The hosts table on each node only has entries for the public network names and ip’s.

 

Here is a listing of ceph pg dump

dumped all in format plain

version 25

stamp 2015-06-26 09:20:31.526802

last_osdmap_epoch 15

last_pg_scan 1

full_ratio 0.95

nearfull_ratio 0.85

pg_stat objects mip        degr       misp      unf         bytes     log          disklog  state      state_stamp      v              reported                upup_primary   acting    acting_primary  last_scrub           scrub_stamp      last_deep_scrub                deep_scrub_stamp

0.22        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.724003                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215311         0'0                2015-06-26 09:17:31.215311

0.21        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:27.372001                0'0          15:8        [2]          2              [2,0]       2              0'0          2015-06-26 09:17:31.215308         0'0                2015-06-26 09:17:31.215308

0.20        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.720347                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215305         0'0                2015-06-26 09:17:31.215305

0.1f        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.674405                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215303         0'0                2015-06-26 09:17:31.215303

0.1e       0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.674307                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215300         0'0                2015-06-26 09:17:31.215300

0.1d       0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:27.371586                0'0          15:8        [2]          2              [2,0]       2              0'0          2015-06-26 09:17:31.215297         0'0                2015-06-26 09:17:31.215297

0.1c        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.674360                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215294         0'0                2015-06-26 09:17:31.215294

0.1b       0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.719656                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215289         0'0                2015-06-26 09:17:31.215289

0.1a        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:27.369325                0'0          15:8        [2]          2              [2,0]       2              0'0          2015-06-26 09:17:31.215286         0'0                2015-06-26 09:17:31.215286

0.19        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.758903                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215283         0'0                2015-06-26 09:17:31.215283

0.18        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.764594                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215281         0'0                2015-06-26 09:17:31.215281

0.17        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.758140                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215278         0'0                2015-06-26 09:17:31.215278

0.16        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.758084                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215275         0'0                2015-06-26 09:17:31.215275

0.15        0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.370381         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215272         0'0          2015-06-26 09:17:31.215272

0.14        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.847404                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215267         0'0                2015-06-26 09:17:31.215267

0.13        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.846842                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215264         0'0                2015-06-26 09:17:31.215264

0.12        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.846877                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215262         0'0                2015-06-26 09:17:31.215262

0.11        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.756301                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215259         0'0                2015-06-26 09:17:31.215259

0.10        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.756269                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215256         0'0                2015-06-26 09:17:31.215256

0.f           0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.366841         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215253         0'0          2015-06-26 09:17:31.215253

0.e          0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.365962         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215250         0'0          2015-06-26 09:17:31.215250

0.d          0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.768589                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215247         0'0                2015-06-26 09:17:31.215247

0.c          0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.756108                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215244         0'0                2015-06-26 09:17:31.215244

0.b          0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.372110         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215241         0'0          2015-06-26 09:17:31.215241

0.a          0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.756028                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215238         0'0                2015-06-26 09:17:31.215238

0.9          0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.767778                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215235         0'0                2015-06-26 09:17:31.215235

0.8          0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.767963                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215232         0'0                2015-06-26 09:17:31.215232

0.7          0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.767434                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215229         0'0                2015-06-26 09:17:31.215229

0.6          0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.683564                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215227         0'0                2015-06-26 09:17:31.215227

0.5          0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.683782                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215224         0'0                2015-06-26 09:17:31.215224

0.4          0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.683467                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215221         0'0                2015-06-26 09:17:31.215221

0.3          0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:27.370531                0'0          15:8        [2]          2              [2,0]       2              0'0          2015-06-26 09:17:31.215218         0'0                2015-06-26 09:17:31.215218

0.2          0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.684159                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215215         0'0                2015-06-26 09:17:31.215215

0.1          0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.368246         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215212         0'0          2015-06-26 09:17:31.215212

0.0          0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.683089                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215207         0'0                2015-06-26 09:17:31.215207

0.3f        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:27.368012                0'0          15:8        [2]          2              [2,0]       2              0'0          2015-06-26 09:17:31.215426         0'0                2015-06-26 09:17:31.215426

0.3e       0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:27.367199                0'0          15:8        [2]          2              [2,0]       2              0'0          2015-06-26 09:17:31.215424         0'0                2015-06-26 09:17:31.215424

0.3d       0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.719736                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215420         0'0                2015-06-26 09:17:31.215420

0.3c        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.682134                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215418         0'0                2015-06-26 09:17:31.215418

0.3b       0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.368153         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215415         0'0          2015-06-26 09:17:31.215415

0.3a        0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.372249         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215412         0'0          2015-06-26 09:17:31.215412

0.39        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.676675                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215409         0'0                2015-06-26 09:17:31.215409

0.38        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.677069                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215406         0'0                2015-06-26 09:17:31.215406

0.37        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.765897                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215403         0'0                2015-06-26 09:17:31.215403

0.36        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.676314                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215400         0'0                2015-06-26 09:17:31.215400

0.35        0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.371676         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215398         0'0          2015-06-26 09:17:31.215398

0.34        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.765998                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215395         0'0                2015-06-26 09:17:31.215395

0.33        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.675119                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215392         0'0                2015-06-26 09:17:31.215392

0.32        0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.370631         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215389         0'0          2015-06-26 09:17:31.215389

0.31        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.764606                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215358         0'0                2015-06-26 09:17:31.215358

0.30        0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.368870         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215355         0'0          2015-06-26 09:17:31.215355

0.2f        0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.368309         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215352         0'0          2015-06-26 09:17:31.215352

0.2e       0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.370069         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215349         0'0          2015-06-26 09:17:31.215349

0.2d       0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:27.369450                0'0          15:8        [2]          2              [2,0]       2              0'0          2015-06-26 09:17:31.215346         0'0                2015-06-26 09:17:31.215346

0.2c        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.682415                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215342         0'0                2015-06-26 09:17:31.215342

0.2b       0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.682323                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215339         0'0                2015-06-26 09:17:31.215339

0.2a        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.682232                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215336         0'0                2015-06-26 09:17:31.215336

0.29        0              0              0              0              0              0              0              0              active+undersized+degraded    2015-06-26 09:18:03.677627                0'0          5:4          [0]          0              [0]          0              0'0          2015-06-26 09:17:31.215333         0'0                2015-06-26 09:17:31.215333

0.28        0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.367862         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215330         0'0          2015-06-26 09:17:31.215330

0.27        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.721827                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215327         0'0                2015-06-26 09:17:31.215327

0.26        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:17.721741                0'0          10:8        [1]          1              [1,0]       1              0'0          2015-06-26 09:17:31.215324         0'0                2015-06-26 09:17:31.215324

0.25        0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.365162         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215322         0'0          2015-06-26 09:17:31.215322

0.24        0              0              0              0              0              0              0              0              active    2015-06-26 09:18:27.366513         0'0                15:6        [2]          2              [2,1]       2              0'0          2015-06-26 09:17:31.215319         0'0          2015-06-26 09:17:31.215319

0.23        0              0              0              0              0              0              0              0              active+remapped            2015-06-26 09:18:27.365571                0'0          15:8        [2]          2              [2,0]       2              0'0          2015-06-26 09:17:31.215314         0'0                2015-06-26 09:17:31.215314

pool 0    0              0              0              0              0              0              0              0

sum      0              0              0              0              0              0              0              0

osdstat kbused kbavail  kb           hb in      hb out

0              34348    488112724           488147072           [1]          []

1              33860    488113212           488147072           [0,2]       []

2              33712    488113360           488147072           [0,1]       []

sum      101920  1464339296         1464441216

 

What are the steps I should take to bring the cluster into a healthy state?  Is now the time to run ‘ceph osd pool set rbd pg_num 64’?

 

Thanks for your help!

 

Best,

 

Dave Durkee

From: Nick Fisk [mailto:nick@xxxxxxxxxx]
Sent: Tuesday, June 23, 2015 12:35 AM
To: Dave Durkee; ceph-users@xxxxxxxxxxxxxx
Subject: RE: New cluster in unhealthy state

 

Ok, some things to check/confirm

 

-          Make sure all your networking is ok, we have seen lots of problems related to jumbo frames not being correctly configured across nodes/switches. Test with pinging large packets between hosts. This includes separate public/cluster networks.

-          Run ceph health detail – Does it show anything interesting?

-          Your pool is definitely a 2 way replication pool?

-          Run a ceph pg dump, can you see a pattern amongst the pgs that have problems?

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Dave Durkee
Sent: 22 June 2015 17:27
To: Nick Fisk; ceph-users@xxxxxxxxxxxxxx
Subject: Re: [ceph-users] New cluster in unhealthy state

 

Nick, I removed the failed OSD’s yet I am still in the same state?

 

ceph> status

    cluster b4419183-5320-4701-aae2-eb61e186b443

     health HEALTH_WARN

            32 pgs degraded

            64 pgs stale

            32 pgs stuck degraded

            246 pgs stuck inactive

            64 pgs stuck stale

            310 pgs stuck unclean

            32 pgs stuck undersized

            32 pgs undersized

            pool rbd pg_num 310 > pgp_num 64

     monmap e1: 1 mons at {mon=172.17.1.16:6789/0}

            election epoch 1, quorum 0 mon

     osdmap e82: 9 osds: 9 up, 9 in

      pgmap v196: 310 pgs, 1 pools, 0 bytes data, 0 objects

            303 MB used, 4189 GB / 4189 GB avail

                 246 creating

                  32 stale+active+undersized+degraded

                  32 stale+active+remapped

 

ceph> osd tree

ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY

-1 4.04997 root default                                    

-2 1.34999     host osd1                                   

 2 0.45000         osd.2       up  1.00000          1.00000

 3 0.45000         osd.3       up  1.00000          1.00000

10 0.45000         osd.10      up  1.00000          1.00000

-3 1.34999     host osd2                                   

 4 0.45000         osd.4       up  1.00000          1.00000

 5 0.45000         osd.5       up  1.00000          1.00000

 6 0.45000         osd.6       up  1.00000          1.00000

-4 1.34999     host osd3                                   

 7 0.45000         osd.7       up  1.00000          1.00000

 8 0.45000         osd.8       up  1.00000          1.00000

 9 0.45000         osd.9       up  1.00000          1.00000

 

ceph> osd pool set rbd pgp_num 310

Error: 16 EBUSY

Status:

currently creating pgs, wait

ceph>

 

Dave Durkee

From: Nick Fisk [mailto:nick@xxxxxxxxxx]
Sent: Saturday, June 20, 2015 9:17 AM
To: Dave Durkee; ceph-users@xxxxxxxxxxxxxx
Subject: RE: New cluster in unhealthy state

 

Hi Dave,

 

It can’t increase the pgp’s because the pg’s are still being created. I can see you currently have 2 OSD’s down, not 100% certain this is the cause, but you might to try and get them back online or remove them if they no longer exist.

 

Nick

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Dave Durkee
Sent: 19 June 2015 23:39
To: Nick Fisk; ceph-users@xxxxxxxxxxxxxx
Subject: Re: New cluster in unhealthy state

 

ceph> osd pool set rbd pgp_num 310

Error: 16 EBUSY

Status:

currently creating pgs, wait

 

What does the above mean?

 

Dave Durkee

From: Nick Fisk [mailto:nick@xxxxxxxxxx]
Sent: Friday, June 19, 2015 4:02 PM
To: Dave Durkee; ceph-users@xxxxxxxxxxxxxx
Subject: RE: New cluster in unhealthy state

 

Try

ceph osd pool set rbd pgp_num 310

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Dave Durkee
Sent: 19 June 2015 22:31
To: ceph-users@xxxxxxxxxxxxxx
Subject: New cluster in unhealthy state

 

I just built a small lab cluster.  1 mon node, 3 osd nodes with 3 ceph disks and 1 os/journal disk, an admin vm and 3 client vm’s.

 

I followed the preflight and install instructions and when I finished adding the osd’s I ran a ceph status and got the following:

 

ceph> status

    cluster b4419183-5320-4701-aae2-eb61e186b443

     health HEALTH_WARN

            32 pgs degraded

            64 pgs stale

            32 pgs stuck degraded

            246 pgs stuck inactive

            64 pgs stuck stale

            310 pgs stuck unclean

            32 pgs stuck undersized

            32 pgs undersized

            pool rbd pg_num 310 > pgp_num 64

     monmap e1: 1 mons at {mon=172.17.1.16:6789/0}

            election epoch 2, quorum 0 mon

     osdmap e49: 11 osds: 9 up, 9 in

      pgmap v122: 310 pgs, 1 pools, 0 bytes data, 0 objects

            298 MB used, 4189 GB / 4189 GB avail

                 246 creating

                  32 stale+active+undersized+degraded

                  32 stale+active+remapped

 

ceph> health

HEALTH_WARN 32 pgs degraded; 64 pgs stale; 32 pgs stuck degraded; 246 pgs stuck inactive; 64 pgs stuck stale; 310 pgs stuck unclean; 32 pgs stuck undersized; 32 pgs undersized; pool rbd pg_num 310 > pgp_num 64

 

ceph> quorum_status

{"election_epoch":2,"quorum":[0],"quorum_names":["mon"],"quorum_leader_name":"mon","monmap":{"epoch":1,"fsid":"b4419183-5320-4701-aae2-eb61e186b443","modified":"0.000000","created":"0.000000","mons":[{"rank":0,"name":"mon","addr":"172.17.1.16:6789\/0"}]}}

 

ceph> mon_status

{"name":"mon","rank":0,"state":"leader","election_epoch":2,"quorum":[0],"outside_quorum":[],"extra_probe_peers":[],"sync_provider":[],"monmap":{"epoch":1,"fsid":"b4419183-5320-4701-aae2-eb61e186b443","modified":"0.000000","created":"0.000000","mons":[{"rank":0,"name":"mon","addr":"172.17.1.16:6789\/0"}]}}

 

ceph> osd tree

ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY

-1 4.94997 root default                                    

-2 2.24998     host osd1                                   

 0 0.45000         osd.0     down        0          1.00000

 1 0.45000         osd.1     down        0          1.00000

 2 0.45000         osd.2       up  1.00000          1.00000

 3 0.45000         osd.3       up  1.00000          1.00000

10 0.45000         osd.10      up  1.00000          1.00000

-3 1.34999     host osd2                                   

 4 0.45000         osd.4       up  1.00000          1.00000

 5 0.45000         osd.5       up  1.00000          1.00000

 6 0.45000         osd.6       up  1.00000          1.00000

-4 1.34999     host osd3                                   

 7 0.45000         osd.7       up  1.00000          1.00000

 8 0.45000         osd.8       up  1.00000          1.00000

 9 0.45000         osd.9       up  1.00000          1.00000

 

 

Admin-node:

[root@admin test-cluster]# cat ceph.conf

[global]

auth_service_required = cephx

filestore_xattr_use_omap = true

auth_client_required = cephx

auth_cluster_required = cephx

mon_host = 172.17.1.16

mon_initial_members = mon

fsid = b4419183-5320-4701-aae2-eb61e186b443

osd pool default size = 2

public network = 172.17.1.0/24

cluster network = 10.0.0.0/24

 

 

How do I diagnose and solve the cluster health issue?  Do you need any additional information to help with the diag process?

 

Thanks!!

 

Dave


Image removed by sender.


Image removed by sender.


Image removed by sender.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux