Re: Ceph health warn MDS failing to respond to cache pressure

gjprabu <gjprabu@xxxxxxxxxxxx> · Wed, 10 May 2017 12:39:36 +0530

Hi Webert,

      Thanks for your reply , can pls suggest ceph pg value for data and metadata. I have set 128 for data and 128 for metadata , is this correct.

Regards
Prabu GJ

---- On Thu, 04 May 2017 17:04:38 +0530 Webert de Souza Lima <webert.boss@xxxxxxxxx> wrote ----

I have faced the same problem many times. Usually it doesn't cause anything bad, but I had a 30 min system outage twice because of this.
It might be because of the number of inodes on your ceph filesystem. Go to the MDS server and do (supposing your mds server id is intcfs-osd1):

 ceph daemon mds.intcfs-osd1 perf dump mds

look for the inodes_max and inodes informations.
inode_max is the maximum inodes to cache and inodes is the current number of inodes currently in the cache.

if it is full, mount the cephfs with the "-o dirstat" option, and cat the mountpoint, for example:

 mount -t ceph  10.0.0.1:6789:/ /mnt -o dirstat,name=admin,secretfile=/etc/ceph/admin.secret
 cat /mnt

look for the rentries number. if it is larger than the inode_max, rise the mds cache size option in ceph.conf to a number that fits and restart the mds (beware: this will cause cephfs to stall for a while. do at your own risk).

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
Belo Horizonte - Brasil

On Thu, May 4, 2017 at 3:28 AM, gjprabu <gjprabu@xxxxxxxxxxxx> wrote:

_______________________________________________
ceph-users mailing list 
ceph-users@xxxxxxxxxxxxxx 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

Hi Team,

          We are running cephfs with 5 OSD and 3 Mon and 1 MDS. There is Heath Warn "failing to respond to cache pressure" . Kindly advise to fix this issue.

cluster b466e09c-f7ae-4e89-99a7-99d30eba0a13
     health HEALTH_WARN
            mds0: Client integ-hm8-1.csez.zohocorpin.com failing to respond to cache pressure
            mds0: Client integ-hm5 failing to respond to cache pressure
            mds0: Client integ-hm9 failing to respond to cache pressure
            mds0: Client integ-hm2 failing to respond to cache pressure
     monmap e2: 3 mons at {intcfs-mon1=192.168.113.113:6789/0,intcfs-mon2=192.168.113.114:6789/0,intcfs-mon3=192.168.113.72:6789/0}
            election epoch 16, quorum 0,1,2 intcfs-mon3,intcfs-mon1,intcfs-mon2
      fsmap e79409: 1/1/1 up {0=intcfs-osd1=up:active}, 1 up:standby
     osdmap e3343: 5 osds: 5 up, 5 in
            flags sortbitwise
      pgmap v13065759: 564 pgs, 3 pools, 5691 GB data, 12134 kobjects
            11567 GB used, 5145 GB / 16713 GB avail
                 562 active+clean
                   2 active+clean+scrubbing+deep
  client io 8090 kB/s rd, 29032 kB/s wr, 25 op/s rd, 129 op/s wr

Regards
Prabu GJ

_______________________________________________
 ceph-users mailing list
 ceph-users@xxxxxxxxxxxxxx
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com