Full Flash Cephfs Optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear all,

I have a full-flash NVME ceph cluster (16.2.6) with currently only cephfs service configured.

55 nodes, 2 partitions for each NVME. I increased the MDS cache memory limit to 128GB (256GB per admin node). It's an hyperconverged K8S cluster, OSD are on K8s worker nodes, so I set "osd memory target" to 16GB.

I have many times those warnings, cephfs client are blocked and I must restart mds service to fix this.

X slow requests, 0 included below; oldest blocked for > 33860.402867 secs
mds.icadmin006(mds.1): X slow requests are blocked > 30 secs
X clients failing to respond to cache pressure (MDS_CLIENT_RECALL)
Health check update: X MDSs report slow requests (MDS_SLOW_REQUEST)

What can I do to avoid this behaviour ?

some useful information :

ceph has been deployed with ceph-ansible on ubuntu 20.04 with kernel 5.4.0-90-generic

> $ ceph -s
  cluster:
    id:     cc402f2e-2444-473e-adab-fe7b38d08546
    health: HEALTH_OK
services:
    mon: 3 daemons, quorum icadmin006,icadmin007,icadmin008 (age 8w)
    mgr: icadmin008(active, since 2w), standbys: icadmin007, icadmin006
    mds: 2/2 daemons up, 1 standby
    osd: 110 osds: 110 up (since 20h), 110 in (since 7d)
data:
    volumes: 1/1 healthy
    pools:   3 pools, 4225 pgs
    objects: 31.84M objects, 24 TiB
    usage:   71 TiB used, 269 TiB / 340 TiB avail
    pgs:     4225 active+clean

> $ ceph osd pool ls detail | grep cephfs
pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 4096 pgp_num 4096 autoscale_mode on last_change 17466 lfor 0/0/213 flags hashpspool stripe_width 0 target_size_ratio 0.2 application cephfs
pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode off last_change 17555 lfor 0/17519/17525 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application ceph

> $ ceph fs status
cephfs - 76 clients
======
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active icadmin008 Reqs: 58 /s 261k 259k 5213 90.0k 1 active icadmin006 Reqs: 22 /s 176k 170k 30.2k 77.7k POOL TYPE USED AVAIL cephfs_metadata metadata 45.8G 82.9T cephfs_data data 71.1T 82.9T STANDBY MDS icadmin007 MDS version: ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)

Thanks for your help,

Best regards,

--
Yoann Moulin
EPFL IC-IT
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux