Re: Cephfs multiple active-active MDS stability and optimization

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 16 Jul 2020 16:07:29 +0200

Hi,

We are using 10 active MDS's with v12.2.12 -- so it is "stable" but we
have several measures and lots of experience to make it like that.
If I were starting a new cluster now, I would use the latest nautilus
or octopus and test the hell out of it before going into prod. Don't
start with mimic now, it's end-of-line.

First, are you really sure you need multi-active MDS? We only use it
where the metadata workload clearly exceeds the abilities of a single
active MDS. Evidence of this would be a high, flat-lined CPU usage on
the active mds, or better would be to track the "hcr" or
"handle_client_request" metric with your monitoring or locally on an
MDS with "ceph daemonperf mds.`hostname -s`". A single MDS can
normally achieve a few thousand hcr/second at best.

Otherwise, here are some relatively advanced things to try to validate
the setup... understanding and succeeding in these things should help
with your nerves:
- Start the cluster, run some workloads, try increasing and decreasing
max_mds on the fly and make sure this is working well
- is the metadata balancing working well with your common workloads?
run your test workloads for hours or days and check that the RSS of
each MDS is not growing unexpectedly
   - does mds balancing make sense for your workload, or are there
some places where pinning to subdirs to a rank is worthwhile?
- with fully active mds, fully loaded metadata caches, test the
failover to standby several times. Try "nice" failovers (e.g.
systemctl stop ceph-mds.target on an active) as well as "not-so-nice"
failovers (e.g. killall -9 ceph-mds)
- Try the cephfs scrub features. Maybe even intentionally corrupt a
file or direntry object then check if cephfs scrub behaves as expected

Hope that helps!

Dan

On Thu, Jul 16, 2020 at 3:01 PM huxiaoyu@xxxxxxxxxxxx
<huxiaoyu@xxxxxxxxxxxx> wrote:
>
> Dear Cepher，
>
> I am planning a cephfs cluster with ca. 100 OSD nodes, each of which has 12 disks, and 2 NVMe (for db wal and cephfs metadata pool). Fpr performance and scalability reasons, i would like to try multi MDS working ative-active. From what i learned in the past, i am not sure about the following questions.
>
> 1  Which Ceph version should i run? I had a good experience with Luminous 12.2.13, and not familiar yet with Mimic and Nautilus. Is  Lumious 12.2.13 stable enouth to run multiple active-active MDS servers for CephFS?
>
> 2 If i had to go Mimic or Nautilus for CephFS, which one is perferable?
>
> 3 I did has some experience with Ceph RBD, but not CephFS, So my question is, what should i pay attention to whening running CephFS? I am somehow nervous......
>
> best regards,
>
> Samuel
>
>
>
>
>
> huxiaoyu@xxxxxxxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx