How To Properly Failover a HA Setup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Everyone,

  I've got a 3 node Jewel cluster setup, and I think I'm missing something.  When I want to take one of my nodes down for maintenance (kernel upgrades or the like) all of my clients (running the kernel module for the cephfs filesystem) hang for a couple of minutes before the redundant servers kick in.  Is there some commands I can enter before taking a box down to speed this up?  Or some way to have the clients/cluster detect failure quicker?

  My setup is as follows.  I have three servers: mon1, file1 and file2.  All three boxes are running monitor daemons.  file1 and file2 are also running MDS and OSD daemons. My ceph -s looks as such:

   cluster 8ab75485-8141-44ad-8eee-f92f286515ac
     health HEALTH_OK
     monmap e2: 3 mons at {FILE1=10.1.1.201:6789/0,FILE2=10.1.1.202:6789/0,MON1=10.1.1.90:6789/0}
            election epoch 1384, quorum 0,1,2 MON1,FILE1,FILE2
      fsmap e63874: 1/1/1 up {0=FILE1=up:active}, 1 up:standby
     osdmap e639: 2 osds: 2 up, 2 in
            flags sortbitwise,require_jewel_osds
      pgmap v15312803: 128 pgs, 3 pools, 236 GB data, 729 kobjects
            493 GB used, 406 GB / 899 GB avail
                 128 active+clean
  client io 3243 kB/s rd, 776 kB/s wr, 8 op/s rd, 1 op/s wr

and my client fstabs look like this:

10.1.1.90:6789,10.1.1.201:6789,10.1.1.202:6789:/shared /mnt/shared ceph noatime,_netdev,name=webdata,secretfile=/etc/ceph/websecret 0 0

Any help would be appreciated.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux