Re: slow ops after cephfs snapshot removal

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 9 Nov 2018 13:38:12 -0800

On Fri, Nov 9, 2018 at 2:24 AM Kenneth Waegeman <kenneth.waegeman@xxxxxxxx> wrote:
Hi all,

On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some 

snapshots:

[root@osd001 ~]# ceph -s

   cluster:

     id:     92bfcf0a-1d39-43b3-b60f-44f01b630e47

     health: HEALTH_WARN

             5 slow ops, oldest one blocked for 1162 sec, mon.mds03 has 

slow ops

   services:

     mon: 3 daemons, quorum mds01,mds02,mds03

     mgr: mds02(active), standbys: mds03, mds01

     mds: ceph_fs-2/2/2 up  {0=mds03=up:active,1=mds01=up:active}, 1 

up:standby

     osd: 544 osds: 544 up, 544 in

   io:

     client:   5.4 KiB/s wr, 0 op/s rd, 0 op/s wr

[root@osd001 ~]# ceph health detail

HEALTH_WARN 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has 

slow ops

SLOW_OPS 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has slow ops

[root@osd001 ~]# ceph -v

ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic 

(stable)

Is this a known issue?

It's not exactly a known issue, but from the output and story you've got here it looks like the OSDs are deleting the snapshot data too fast and the MDS isn't getting quick enough replies? Or maybe you have an overlarge CephFS directory which is taking a long time to clean up somehow; you should get the MDS ops and the MDS' objecter ops in flight and see what specifically is taking so long.
-Greg

Cheers,

Kenneth

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com