Moving this to ceph-user. On Wed, Jul 27, 2016 at 8:36 AM, Kostya Velychkovsky <velychkovsky@xxxxxxxxx> wrote: > Hello. I have test CEPH cluster with 5 nodes: 3 MON and 2 OSD > > This is my ceph.conf > > [global] > fsid = 714da611-2c40-4930-b5b9-d57e70d5cf7e > mon_initial_members = node1 > mon_host = node1,node3,node4 > > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > osd_pool_default_size = 2 > public_network = X.X.X.X/24 > > [mon] > osd report timeout = 15 > osd min down reports = 2 > > [osd] > mon report interval max = 30 > mon heartbeat interval = 15 > > > So, while I run some fail tests and hard reset one OSD node, I have long > timeout while ceph mark this OSD down, ~15 minutes > > and ceph -s display that cluster OK. > --------------- > cluster 714da611-2c40-4930-b5b9-d57e70d5cf7e > health HEALTH_OK > monmap e5: 3 mons at .... > election epoch 272, quorum 0,1,2 node1,node3,node4 > osdmap e90: 2 osds: 2 up, 2 in > --------------- > Only after ~15 minutes mon nodes Mark this OSD down, and change state of > cluster > ---------------- > osdmap e86: 2 osds: 1 up, 2 in; 64 remapped pgs > flags sortbitwise > pgmap v3927: 64 pgs, 1 pools, 10961 MB data, 2752 objects > 22039 MB used, 168 GB / 189 GB avail > 2752/5504 objects degraded (50.000%) > 64 active+undersized+degraded > --------------- > > I tried to ajust 'osd report timeout' but have the same result. > > Can you pls help me tune my cluster to decrease this reaction time ? > > -- > Best Regards > > Kostiantyn Velychkovsky > > _______________________________________________ > Ceph-community mailing list > Ceph-community@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com > -- Best Regards, Patrick McGarry Director Ceph Community || Red Hat http://ceph.com || http://community.redhat.com @scuttlemonkey || @ceph _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com