On Thu, Apr 26, 2018 at 3:16 PM, Scottix <scottix@xxxxxxxxx> wrote: > Updated to 12.2.5 > > We are starting to test multi_mds cephfs and we are going through some > failure scenarios in our test cluster. > > We are simulating a power failure to one machine and we are getting mixed > results of what happens to the file system. > > This is the status of the mds once we simulate the power loss considering > there are no more standbys. > > mds: cephfs-2/2/2 up > {0=CephDeploy100=up:active,1=TigoMDS100=up:active(laggy or crashed)} > > 1. It is a little unclear if it is laggy or really is down, using this line > alone. Of course -- the mons can't tell the difference! > 2. The first time we lost total access to ceph folder and just blocked i/o You must have standbys for high availability. This is the docs. > 3. One time we were still able to access ceph folder and everything seems to > be running. It depends(tm) on how the metadata is distributed and what locks are held by each MDS. Standbys are not optional in any production cluster. -- Patrick Donnelly _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com