Multiple MDS Daemon needed?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

We have a slurm cluster with 25 clients, each with 256 cores, each mounting a cephfs filesystem as their main storage target. The workload can be heavy at times.

We have two active MDS daemons and one standby. A lot of the time everything is healthy but we sometimes get warnings about MDS daemons being slow on requests, behind on trimming, etc. I realize their may be a bug in play, but also, I was wondering if we simply didn't have enough MDS daemons to handle the load. Is there a way to know if adding a MDS daemon would help? We could add a third active MDS if needed. But I don't want to start adding a bunch of MDS's if that won't help.

The OSD servers seem fine. It's mainly the MDS instances that are complaining.

We are running reef 18.2.1.

For reference, when things look healthy:

# ceph fs status slugfs
slugfs - 34 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active slugfs.pr-md-03.mclckv Reqs: 273 /s 2759k 2636k 362k 1079k 1 active slugfs.pr-md-01.xdtppo Reqs: 194 /s 868k 674k 67.3k 351k
       POOL           TYPE     USED  AVAIL
 cephfs_metadata    metadata   127G  3281G
cephfs_md_and_data    data       0   98.3T
   cephfs_data        data     740T   196T
     STANDBY MDS
slugfs.pr-md-02.sbblqq
MDS version: ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)

# ceph -s
  cluster:
    id:     58bde08a-d7ed-11ee-9098-506b4b4da440
    health: HEALTH_OK

  services:
mon: 5 daemons, quorum pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)
    mgr: pr-md-01.jemmdf(active, since 5w), standbys: pr-md-02.emffhz
    mds: 2/2 daemons up, 1 standby
    osd: 46 osds: 46 up (since 8d), 46 in (since 4w)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 1313 pgs
    objects: 271.17M objects, 493 TiB
    usage:   744 TiB used, 384 TiB / 1.1 PiB avail
    pgs:     1307 active+clean
             4    active+clean+scrubbing
             2    active+clean+scrubbing+deep

  io:
    client:   39 MiB/s rd, 108 MiB/s wr, 1.96k op/s rd, 54 op/s wr




But when things are in "warning" mode, it looks like this:

# ceph -s
  cluster:
    id:     58bde08a-d7ed-11ee-9098-506b4b4da440
    health: HEALTH_WARN
            1 filesystem is degraded
            1 clients failing to advance oldest client/flush tid
            1 MDSs report slow requests
            1 MDSs behind on trimming

  services:
mon: 5 daemons, quorum pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)
    mgr: pr-md-01.jemmdf(active, since 5w), standbys: pr-md-02.emffhz
    mds: 2/2 daemons up, 1 standby
    osd: 46 osds: 46 up (since 8d), 46 in (since 4w)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 1313 pgs
    objects: 271.28M objects, 494 TiB
    usage:   746 TiB used, 382 TiB / 1.1 PiB avail
    pgs:     1307 active+clean
             5    active+clean+scrubbing
             1    active+clean+scrubbing+deep

  io:
    client:   55 MiB/s rd, 2.6 MiB/s wr, 15 op/s rd, 46 op/s wr

And this:

# ceph health detail
HEALTH_WARN 2 clients failing to advance oldest client/flush tid; 2 MDSs report slow requests; 1 MDSs behind on trimming [WRN] MDS_CLIENT_OLDEST_TID: 2 clients failing to advance oldest client/flush tid mds.slugfs.pr-md-01.xdtppo(mds.0): Client phoenix-06.prism failing to advance its oldest client/flush tid. client_id: 125780 mds.slugfs.pr-md-02.sbblqq(mds.1): Client phoenix-00.prism failing to advance its oldest client/flush tid. client_id: 99385
[WRN] MDS_SLOW_REQUEST: 2 MDSs report slow requests
mds.slugfs.pr-md-01.xdtppo(mds.0): 4 slow requests are blocked > 30 secs mds.slugfs.pr-md-02.sbblqq(mds.1): 67 slow requests are blocked > 30 secs
[WRN] MDS_TRIM: 1 MDSs behind on trimming
mds.slugfs.pr-md-02.sbblqq(mds.1): Behind on trimming (109410/250) max_segments: 250, num_segments: 109410

The "cure" is the restart the active MDS daemons, one at a time. Then everything becomes healthy again, for a time.

We also have the following MDS config items in play:

mds_cache_memory_limit = 8589934592
mds_cache_trim_decay_rate = .6
mds_log_max_segments = 250

Thanks for any pointers!

cheers,
erich
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux