Hi I would innterested to know why I get 2 different answers: (this is asking the OSDs directly?) ceph osd dump but osd.0 is in, up, up_from 179 up_thru 185 down_at 176 osd.1/2 are in, up, up_from 8/13 up_thru 224 down_at 0 ceph -s reports 1 OSD down So all osds in the dump are in and up, but ... I guess that osd.0 is telling me when it came back and that it has not all the data that osd.1/2 have because they go up to 224 Which is why ceph -s tells me that one osd is down? Or did the leading MON nog get fully informed. What daemon is missing what part of the communication? And what for type of warning/error should I look for in the log files? Thanx, --WjW 4: /home/wjw/wip/qa/workunits/cephtool/test.sh:12: check_no_osd_down: ceph osd dump 4: epoch 230 4: fsid d00499b7-6b4c-4e71-a862-420b0d921097 4: created 2016-09-05 13:10:08.247894 4: modified 2016-09-05 13:13:17.952969 4: flags sortbitwise,require_jewel_osds,require_kraken_osds 4: pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1 flags hashpspool stripe_width 0 4: max_osd 3 4: osd.0 up in weight 1 up_from 179 up_thru 185 down_at 176 last_clean_interval [4,178) 127.0.0.1:6800/27490 127.0.0.1:6812/1027490 127.0.0.1:6813/1027490 127.0.0.1:6814/1027490 exists,up cb894225-670c-4e80-8e15-123c161cd00e 4: osd.1 up in weight 1 up_from 8 up_thru 224 down_at 0 last_clean_interval [0,0) 127.0.0.1:6804/27505 127.0.0.1:6805/27505 127.0.0.1:6806/27505 127.0.0.1:6807/27505 exists,up 8cf8bf1f-27ce-49bb-88cf-c1d54511c434 4: osd.2 up in weight 1 up_from 13 up_thru 224 down_at 0 last_clean_interval [0,0) 127.0.0.1:6808/27520 127.0.0.1:6809/27520 127.0.0.1:6810/27520 127.0.0.1:6811/27520 exists,up 568223b6-06d6-41d0-92c3-286227819bb5 4: pg_temp 0.2 [1,2] 4: pg_temp 0.6 [1,2] 4: pg_temp 0.7 [1,2] 4: /home/wjw/wip/qa/workunits/cephtool/test.sh:13: check_no_osd_down: ceph -s 4: cluster d00499b7-6b4c-4e71-a862-420b0d921097 4: health HEALTH_WARN 4: 5 pgs peering 4: 3 pgs stale 4: 1/3 in osds are down 4: monmap e1: 3 mons at {a=127.0.0.1:7202/0,b=127.0.0.1:7203/0,c=127.0.0.1:7204/0} 4: election epoch 6, quorum 0,1,2 a,b,c 4: osdmap e232: 3 osds: 2 up, 3 in; 5 remapped pgs 4: flags sortbitwise,require_jewel_osds,require_kraken_osds 4: pgmap v326: 8 pgs, 1 pools, 0 bytes data, 0 objects 4: 200 GB used, 248 GB / 448 GB avail 4: 3 stale+active+clean 4: 3 remapped+peering 4: 2 peering -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html