Did you check this? https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg39886.html -----Original Message----- From: Daniel Carrasco [mailto:d.carrasco@xxxxxxxxx] Sent: dinsdag 17 oktober 2017 17:49 To: ceph-users@xxxxxxxx Subject: OSD are marked as down after jewel -> luminous upgrade Hello, Today I've decided to upgrade my Ceph cluster to latest LTS version. To do it I've used the steps posted on release notes: http://ceph.com/releases/v12-2-0-luminous-released/ After upgrade all the daemons I've noticed that all OSD daemons are marked as down even when all are working, so the cluster becomes down. Maybe the problem is the command "ceph osd require-osd-release luminous", but all OSD are on Luminous version. ------------------------------------------------------------------------ ------------------------------------- ------------------------------------------------------------------------ ------------------------------------- # ceph versions { "mon": { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3 }, "mgr": { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3 }, "osd": { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 2 }, "mds": { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 2 }, "overall": { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 10 } } ------------------------------------------------------------------------ ------------------------------------- ------------------------------------------------------------------------ ------------------------------------- # ceph osd versions { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 2 } # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08780 root default -2 0.04390 host alantra_fs-01 0 ssd 0.04390 osd.0 up 1.00000 1.00000 -3 0.04390 host alantra_fs-02 1 ssd 0.04390 osd.1 up 1.00000 1.00000 -4 0 host alantra_fs-03 ------------------------------------------------------------------------ ------------------------------------- ------------------------------------------------------------------------ ------------------------------------- # ceph -s cluster: id: 5f8e66b5-1adc-4930-b5d8-c0f44dc2037e health: HEALTH_WARN nodown flag(s) set services: mon: 3 daemons, quorum alantra_fs-02,alantra_fs-01,alantra_fs-03 mgr: alantra_fs-03(active), standbys: alantra_fs-01, alantra_fs-02 mds: cephfs-1/1/1 up {0=alantra_fs-01=up:active}, 1 up:standby osd: 2 osds: 2 up, 2 in flags nodown data: pools: 3 pools, 192 pgs objects: 40177 objects, 3510 MB usage: 7486 MB used, 84626 MB / 92112 MB avail pgs: 192 active+clean io: client: 564 kB/s rd, 767 B/s wr, 33 op/s rd, 0 op/s wr ------------------------------------------------------------------------ ------------------------------------- ------------------------------------------------------------------------ ------------------------------------- Log: 2017-10-17 16:15:25.466807 mon.alantra_fs-02 [INF] osd.0 marked down after no beacon for 29.864632 seconds 2017-10-17 16:15:25.467557 mon.alantra_fs-02 [WRN] Health check failed: 1 osds down (OSD_DOWN) 2017-10-17 16:15:25.467587 mon.alantra_fs-02 [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN) 2017-10-17 16:15:27.494526 mon.alantra_fs-02 [WRN] Health check failed: Degraded data redundancy: 63 pgs unclean (PG_DEGRADED) 2017-10-17 16:15:27.501956 mon.alantra_fs-02 [INF] Health check cleared: OSD_DOWN (was: 1 osds down) 2017-10-17 16:15:27.501997 mon.alantra_fs-02 [INF] Health check cleared: OSD_HOST_DOWN (was: 1 host (1 osds) down) 2017-10-17 16:15:27.502012 mon.alantra_fs-02 [INF] Cluster is now healthy 2017-10-17 16:15:27.518798 mon.alantra_fs-02 [INF] osd.0 10.20.1.109:6801/3319 boot 2017-10-17 16:15:26.414023 osd.0 [WRN] Monitor daemon marked osd.0 down, but it is still running 2017-10-17 16:15:30.470477 mon.alantra_fs-02 [INF] osd.1 marked down after no beacon for 25.007336 seconds 2017-10-17 16:15:30.471014 mon.alantra_fs-02 [WRN] Health check failed: 1 osds down (OSD_DOWN) 2017-10-17 16:15:30.471047 mon.alantra_fs-02 [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN) 2017-10-17 16:15:30.532427 mon.alantra_fs-02 [WRN] overall HEALTH_WARN 1 osds down; 1 host (1 osds) down; Degraded data redundancy: 63 pgs unclean 2017-10-17 16:15:31.590661 mon.alantra_fs-02 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 63 pgs unclean) 2017-10-17 16:15:34.703027 mon.alantra_fs-02 [INF] Health check cleared: OSD_DOWN (was: 1 osds down) 2017-10-17 16:15:34.703061 mon.alantra_fs-02 [INF] Health check cleared: OSD_HOST_DOWN (was: 1 host (1 osds) down) 2017-10-17 16:15:34.703078 mon.alantra_fs-02 [INF] Cluster is now healthy 2017-10-17 16:15:34.714002 mon.alantra_fs-02 [INF] osd.1 10.20.1.97:6801/2310 boot 2017-10-17 16:15:33.614640 osd.1 [WRN] Monitor daemon marked osd.1 down, but it is still running 2017-10-17 16:15:35.767050 mon.alantra_fs-02 [WRN] Health check failed: Degraded data redundancy: 40176/80352 objects degraded (50.000%), 63 pgs unclean, 192 pgs degraded (PG_DEGRADED) 2017-10-17 16:15:40.852094 mon.alantra_fs-02 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 19555/80352 objects degraded (24.337%), 63 pgs unclean, 96 pgs degraded) 2017-10-17 16:15:40.852129 mon.alantra_fs-02 [INF] Cluster is now healthy 2017-10-17 16:15:55.475549 mon.alantra_fs-02 [INF] osd.0 marked down after no beacon for 25.005072 seconds 2017-10-17 16:15:55.476086 mon.alantra_fs-02 [WRN] Health check failed: 1 osds down (OSD_DOWN) 2017-10-17 16:15:55.476114 mon.alantra_fs-02 [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN) 2017-10-17 16:15:57.557651 mon.alantra_fs-02 [WRN] Health check failed: Degraded data redundancy: 63 pgs unclean (PG_DEGRADED) 2017-10-17 16:15:57.558176 mon.alantra_fs-02 [INF] Health check cleared: OSD_DOWN (was: 1 osds down) 2017-10-17 16:15:57.558206 mon.alantra_fs-02 [INF] Health check cleared: OSD_HOST_DOWN (was: 1 host (1 osds) down) 2017-10-17 16:15:57.558230 mon.alantra_fs-02 [INF] Cluster is now healthy 2017-10-17 16:15:57.596646 mon.alantra_fs-02 [INF] osd.0 10.20.1.109:6801/3319 boot 2017-10-17 16:15:56.447979 osd.0 [WRN] Monitor daemon marked osd.0 down, but it is still running 2017-10-17 16:16:00.479015 mon.alantra_fs-02 [INF] osd.1 marked down after no beacon for 25.004706 seconds 2017-10-17 16:16:00.479536 mon.alantra_fs-02 [WRN] Health check failed: 1 osds down (OSD_DOWN) 2017-10-17 16:16:00.479577 mon.alantra_fs-02 [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN) 2017-10-17 16:16:01.634966 mon.alantra_fs-02 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 63 pgs unclean) 2017-10-17 16:16:02.643766 mon.alantra_fs-02 [INF] Health check cleared: OSD_DOWN (was: 1 osds down) 2017-10-17 16:16:02.643798 mon.alantra_fs-02 [INF] Health check cleared: OSD_HOST_DOWN (was: 1 host (1 osds) down) 2017-10-17 16:16:02.643815 mon.alantra_fs-02 [INF] Cluster is now healthy 2017-10-17 16:16:02.691761 mon.alantra_fs-02 [INF] osd.1 10.20.1.97:6801/2310 boot 2017-10-17 16:16:01.153925 osd.1 [WRN] Monitor daemon marked osd.1 down, but it is still running 2017-10-17 16:16:25.497378 mon.alantra_fs-02 [INF] osd.0 marked down after no beacon for 25.018358 seconds 2017-10-17 16:16:25.497946 mon.alantra_fs-02 [WRN] Health check failed: 1 osds down (OSD_DOWN) 2017-10-17 16:16:25.497973 mon.alantra_fs-02 [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN) 2017-10-17 16:16:27.517811 mon.alantra_fs-02 [WRN] Health check failed: Degraded data redundancy: 62 pgs unclean (PG_DEGRADED) 2017-10-17 16:16:28.538617 mon.alantra_fs-02 [INF] Health check cleared: OSD_DOWN (was: 1 osds down) 2017-10-17 16:16:28.538647 mon.alantra_fs-02 [INF] Health check cleared: OSD_HOST_DOWN (was: 1 host (1 osds) down) 2017-10-17 16:16:28.552535 mon.alantra_fs-02 [INF] osd.0 10.20.1.109:6801/3319 boot 2017-10-17 16:16:27.287020 osd.0 [WRN] Monitor daemon marked osd.0 down, but it is still running 2017-10-17 16:16:30.500686 mon.alantra_fs-02 [INF] osd.1 marked down after no beacon for 25.007173 seconds 2017-10-17 16:16:30.501217 mon.alantra_fs-02 [WRN] Health check failed: 1 osds down (OSD_DOWN) 2017-10-17 16:16:30.501250 mon.alantra_fs-02 [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN) 2017-10-17 16:16:30.532618 mon.alantra_fs-02 [WRN] overall HEALTH_WARN 1 osds down; 1 host (1 osds) down; Degraded data redundancy: 62 pgs unclean 2017-10-17 16:16:34.869504 mon.alantra_fs-02 [WRN] Health check update: Degraded data redundancy: 40177/80354 objects degraded (50.000%), 63 pgs unclean, 192 pgs degraded (PG_DEGRADED) 2017-10-17 16:16:34.192978 osd.1 [WRN] Monitor daemon marked osd.1 down, but it is still running 2017-10-17 16:16:55.505503 mon.alantra_fs-02 [INF] osd.0 marked down after no beacon for 25.004803 seconds 2017-10-17 16:16:55.506192 mon.alantra_fs-02 [WRN] Health check update: 2 osds down (OSD_DOWN) 2017-10-17 16:16:55.506223 mon.alantra_fs-02 [WRN] Health check update: 3 hosts (2 osds) down (OSD_HOST_DOWN) 2017-10-17 16:16:55.506242 mon.alantra_fs-02 [WRN] Health check failed: 1 root (2 osds) down (OSD_ROOT_DOWN) 2017-10-17 16:16:56.530112 mon.alantra_fs-02 [INF] Health check cleared: OSD_ROOT_DOWN (was: 1 root (2 osds) down) 2017-10-17 16:16:56.554446 mon.alantra_fs-02 [INF] osd.0 10.20.1.109:6801/3319 boot 2017-10-17 16:16:55.542656 osd.0 [WRN] Monitor daemon marked osd.0 down, but it is still running 2017-10-17 16:17:00.524161 mon.alantra_fs-02 [WRN] Health check update: 1 osds down (OSD_DOWN) 2017-10-17 16:17:00.524217 mon.alantra_fs-02 [WRN] Health check update: 1 host (1 osds) down (OSD_HOST_DOWN) 2017-10-17 16:17:00.553635 mon.alantra_fs-02 [INF] mon.1 10.20.1.109:6789/0 2017-10-17 16:17:00.553691 mon.alantra_fs-02 [INF] mon.2 10.20.1.216:6789/0 2017-10-17 16:17:16.885662 mon.alantra_fs-02 [WRN] Health check update: Degraded data redundancy: 40177/80354 objects degraded (50.000%), 96 pgs unclean, 192 pgs degraded (PG_DEGRADED) 2017-10-17 16:17:25.528348 mon.alantra_fs-02 [INF] osd.0 marked down after no beacon for 25.004060 seconds 2017-10-17 16:17:25.528960 mon.alantra_fs-02 [WRN] Health check update: 2 osds down (OSD_DOWN) 2017-10-17 16:17:25.528991 mon.alantra_fs-02 [WRN] Health check update: 3 hosts (2 osds) down (OSD_HOST_DOWN) 2017-10-17 16:17:25.529011 mon.alantra_fs-02 [WRN] Health check failed: 1 root (2 osds) down (OSD_ROOT_DOWN) 2017-10-17 16:17:26.544228 mon.alantra_fs-02 [INF] Health check cleared: OSD_ROOT_DOWN (was: 1 root (2 osds) down) 2017-10-17 16:17:26.568819 mon.alantra_fs-02 [INF] osd.0 10.20.1.109:6801/3319 boot 2017-10-17 16:17:25.557037 osd.0 [WRN] Monitor daemon marked osd.0 down, but it is still running 2017-10-17 16:17:30.532840 mon.alantra_fs-02 [WRN] overall HEALTH_WARN 1 osds down; 1 host (1 osds) down; Degraded data redundancy: 40177/80354 objects degraded (50.000%), 96 pgs unclean, 192 pgs degraded 2017-10-17 16:17:30.538294 mon.alantra_fs-02 [WRN] Health check update: 1 osds down (OSD_DOWN) 2017-10-17 16:17:30.538333 mon.alantra_fs-02 [WRN] Health check update: 1 host (1 osds) down (OSD_HOST_DOWN) 2017-10-17 16:17:31.602434 mon.alantra_fs-02 [WRN] Health check update: Degraded data redundancy: 40177/80354 objects degraded (50.000%), 192 pgs unclean, 192 pgs degraded (PG_DEGRADED) 2017-10-17 16:17:55.540005 mon.alantra_fs-02 [INF] osd.0 marked down after no beacon for 25.001599 seconds 2017-10-17 16:17:55.540538 mon.alantra_fs-02 [WRN] Health check update: 2 osds down (OSD_DOWN) 2017-10-17 16:17:55.540562 mon.alantra_fs-02 [WRN] Health check update: 3 hosts (2 osds) down (OSD_HOST_DOWN) 2017-10-17 16:17:55.540585 mon.alantra_fs-02 [WRN] Health check failed: 1 root (2 osds) down (OSD_ROOT_DOWN) 2017-10-17 16:18:28.916734 mon.alantra_fs-02 [WRN] Health check update: Degraded data redundancy: 40177/80354 objects degraded (50.000%), 192 pgs unclean, 192 pgs degraded, 192 pgs undersized (PG_DEGRADED) 2017-10-17 16:18:30.533096 mon.alantra_fs-02 [WRN] overall HEALTH_WARN 2 osds down; 3 hosts (2 osds) down; 1 root (2 osds) down; Degraded data redundancy: 40177/80354 objects degraded (50.000%), 192 pgs unclean, 192 pgs degraded, 192 pgs undersized 2017-10-17 16:18:56.929295 mon.alantra_fs-02 [WRN] Health check failed: Reduced data availability: 192 pgs stale (PG_AVAILABILITY) ------------------------------------------------------------------------ ------------------------------------- ------------------------------------------------------------------------ ------------------------------------- ceph.conf [global] fsid = 5f8e66b5-1adc-4930-b5d8-c0f44dc2037e mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03 mon_host = 10.20.1.109,10.20.1.97,10.20.1.216 public_network = 10.20.1.0/24 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx ## ### OSD ## [osd] osd_mon_heartbeat_interval = 5 osd_mon_report_interval_max = 10 osd_heartbeat_grace = 10 osd_fast_fail_on_connection_refused = True osd_pool_default_pg_num = 128 osd_pool_default_pgp_num = 128 osd_pool_default_size = 2 osd_pool_default_min_size = 2 ## ### Monitores ## [mon] mon_allow_pool_delete = false mon_osd_report_timeout = 25 mon_osd_min_down_reporters = 1 [mon.alantra_fs-01] host = alantra_fs-01 mon_addr = 10.20.1.109:6789 [mon.alantra_fs-02] host = alantra_fs-02 mon_addr = 10.20.1.97:6789 [mon.alantra_fs-03] host = alantra_fs-03 mon_addr = 10.20.1.216:6789 ## ### MDS ## [mds] mds_cache_size = 250000 ## ### Client ## [client] client_cache_size = 32768 client_mount_timeout = 30 client_oc_max_objects = 2000 client_oc_size = 629145600 rbd_cache = true rbd_cache_size = 671088640 ------------------------------------------------------------------------ ------------------------------------- ------------------------------------------------------------------------ ------------------------------------- For now I've added the nodown flag to keep all OSD online, and all is working fine, but this is not the best way to do it. Someone knows how to fix this problem?. Maybe this release needs to open new ports on firewall? Thanks!! -- _________________________________________ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com <http://www.i2tic.com/> _________________________________________ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com