Re: HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

Daniel Carrasco <d.carrasco@xxxxxxxxx> · Wed, 14 Jun 2017 21:06:42 +0200

Finally I've created three nodes, I've increased the size of pools to 3 and I've created 3 MDS (active, standby, standby).
Today the server has decided to fail and I've noticed that failover is not working... The ceph -s command shows like everything was OK but the clients weren't able to connect and I had to restart the failing node and reconect the clients manually to make it work again (even I think that the active MDS was in another node).

I don't know if maybe is because the server was not fully down, and only some connections were failing. I'll do some tests too see.

Another question: How many memory needs a node to work?, because I've nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high memory usage (more than 1GB on the OSD).
The OSD size is 50GB and the data that contains is less than 3GB.

Thanks, and Greetings!!

2017-06-12 23:33 GMT+02:00 Mazzystr <mazzystr@xxxxxxxxx>:
Since your app is an Apache / php app is it possible for you to reconfigure the app to use S3 module rather than a posix open file()?  Then with Ceph drop CephFS and configure Civetweb S3 gateway?  You can have "active-active" endpoints with round robin dns or F5 or something.  You would also have to repopulate objects into the rados pools.
Also increase that size parameter to 3.  ;-)

Lots of work for active-active but the whole stack will be much more resilient coming from some with a ClearCase / NFS / stale file handles up the wazoo background

On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco <d.carrasco@xxxxxxxxx> wrote:
2017-06-12 16:10 GMT+02:00 David Turner <drakonstein@xxxxxxxxx>:
I have an incredibly light-weight cephfs configuration.  I set up an MDS on each mon (3 total), and have 9TB of data in cephfs.  This data only has 1 client that reads a few files at a time.  I haven't noticed any downtime when it fails over to a standby MDS.  So it definitely depends on your workload as to how a failover will affect your environment.

On Mon, Jun 12, 2017 at 9:59 AM John Petrini <jpetrini@xxxxxxxxxxxx> wrote:
We use the following in our ceph.conf for MDS failover. We're running one active and one standby. Last time it failed over there was about 2 minutes of downtime before the mounts started responding again but it did recover gracefully.
[mds]
max_mds = 1
mds_standby_for_rank = 0
mds_standby_replay = true

___
John Petrini
_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Thanks to both.
Just now i'm working on that because I needs a very fast failover. For now the tests give me a very fast response when an OSD fails (about 5 seconds), but a very slow response when the main MDS fails (I've not tested the real time, but was not working for a long time). Maybe was because I created the other MDS after mount, because I've done some test just before send this email and now looks very fast (i've not noticed the downtime).

Greetings!!

-- 
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com