Re: Multi-MDS Failover

Scottix <scottix@xxxxxxxxx> · Fri, 27 Apr 2018 02:04:21 +0000

Ok let me try to explain this better, we are doing this back and forth and its not going anywhere. I'll just be as genuine as I can and explain the issue.
What we are testing is a critical failure scenario and actually more of a real world scenario. Basically just what happens when it is 1AM and the shit hits the fan, half of your servers are down and 1 of the 3 MDS boxes are still alive.
There is one very important fact that happens with CephFS and when the single Active MDS server fails. It is guaranteed 100% all IO is blocked. No split-brain, no corrupted data, 100% guaranteed ever since we started using CephFS

Now with multi_mds, I understand this changes the logic and I understand how difficult and how hard this problem is, trust me I would not be able to tackle this. Basically I need to answer the question; what happens when 1 of 2 multi_mds fails with no standbys ready to come save them?
What I have tested is not the same of a single active MDS; this absolutely changes the logic of what happens and how we troubleshoot. The CephFS is still alive and it does allow operations and does allow resources to go through. How, why and what is affected are very relevant questions if this is what the failure looks like since it is not 100% blocking.

This is the problem, I have programs writing a massive amount of data and I don't want it corrupted or lost. I need to know what happens and I need to have guarantees.

Best

On Thu, Apr 26, 2018 at 5:03 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
On Thu, Apr 26, 2018 at 4:40 PM, Scottix <scottix@xxxxxxxxx> wrote:

>> Of course -- the mons can't tell the difference!

> That is really unfortunate, it would be nice to know if the filesystem has

> been degraded and to what degree.

If a rank is laggy/crashed, the file system as a whole is generally

unavailable. The span between partial outage and full is small and not

worth quantifying.

>> You must have standbys for high availability. This is the docs.

> Ok but what if you have your standby go down and a master go down. This

> could happen in the real world and is a valid error scenario.

>Also there is

> a period between when the standby becomes active what happens in-between

> that time?

The standby MDS goes through a series of states where it recovers the

lost state and connections with clients. Finally, it goes active.

>> It depends(tm) on how the metadata is distributed and what locks are

> held by each MDS.

> Your saying depending on which mds had a lock on a resource it will block

> that particular POSIX operation? Can you clarify a little bit?

>

>> Standbys are not optional in any production cluster.

> Of course in production I would hope people have standbys but in theory

> there is no enforcement in Ceph for this other than a warning. So when you

> say not optional that is not exactly true it will still run.

It's self-defeating to expect CephFS to enforce having standbys --

presumably by throwing an error or becoming unavailable -- when the

standbys exist to make the system available.

There's nothing to enforce. A warning is sufficient for the operator

that (a) they didn't configure any standbys or (b) MDS daemon

processes/boxes are going away and not coming back as standbys (i.e.

the pool of MDS daemons is decreasing with each failover)

-- 

Patrick Donnelly

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com