Hi, We are using 10 active MDS's with v12.2.12 -- so it is "stable" but we have several measures and lots of experience to make it like that. If I were starting a new cluster now, I would use the latest nautilus or octopus and test the hell out of it before going into prod. Don't start with mimic now, it's end-of-line. First, are you really sure you need multi-active MDS? We only use it where the metadata workload clearly exceeds the abilities of a single active MDS. Evidence of this would be a high, flat-lined CPU usage on the active mds, or better would be to track the "hcr" or "handle_client_request" metric with your monitoring or locally on an MDS with "ceph daemonperf mds.`hostname -s`". A single MDS can normally achieve a few thousand hcr/second at best. Otherwise, here are some relatively advanced things to try to validate the setup... understanding and succeeding in these things should help with your nerves: - Start the cluster, run some workloads, try increasing and decreasing max_mds on the fly and make sure this is working well - is the metadata balancing working well with your common workloads? run your test workloads for hours or days and check that the RSS of each MDS is not growing unexpectedly - does mds balancing make sense for your workload, or are there some places where pinning to subdirs to a rank is worthwhile? - with fully active mds, fully loaded metadata caches, test the failover to standby several times. Try "nice" failovers (e.g. systemctl stop ceph-mds.target on an active) as well as "not-so-nice" failovers (e.g. killall -9 ceph-mds) - Try the cephfs scrub features. Maybe even intentionally corrupt a file or direntry object then check if cephfs scrub behaves as expected Hope that helps! Dan On Thu, Jul 16, 2020 at 3:01 PM huxiaoyu@xxxxxxxxxxxx <huxiaoyu@xxxxxxxxxxxx> wrote: > > Dear Cepher, > > I am planning a cephfs cluster with ca. 100 OSD nodes, each of which has 12 disks, and 2 NVMe (for db wal and cephfs metadata pool). Fpr performance and scalability reasons, i would like to try multi MDS working ative-active. From what i learned in the past, i am not sure about the following questions. > > 1 Which Ceph version should i run? I had a good experience with Luminous 12.2.13, and not familiar yet with Mimic and Nautilus. Is Lumious 12.2.13 stable enouth to run multiple active-active MDS servers for CephFS? > > 2 If i had to go Mimic or Nautilus for CephFS, which one is perferable? > > 3 I did has some experience with Ceph RBD, but not CephFS, So my question is, what should i pay attention to whening running CephFS? I am somehow nervous...... > > best regards, > > Samuel > > > > > > huxiaoyu@xxxxxxxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx