On Fri, Nov 9, 2018 at 2:24 AM Kenneth Waegeman <kenneth.waegeman@xxxxxxxx> wrote:
Hi all,
On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some
snapshots:
[root@osd001 ~]# ceph -s
cluster:
id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
health: HEALTH_WARN
5 slow ops, oldest one blocked for 1162 sec, mon.mds03 has
slow ops
services:
mon: 3 daemons, quorum mds01,mds02,mds03
mgr: mds02(active), standbys: mds03, mds01
mds: ceph_fs-2/2/2 up {0=mds03=up:active,1=mds01=up:active}, 1
up:standby
osd: 544 osds: 544 up, 544 in
io:
client: 5.4 KiB/s wr, 0 op/s rd, 0 op/s wr
[root@osd001 ~]# ceph health detail
HEALTH_WARN 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has
slow ops
SLOW_OPS 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has slow ops
[root@osd001 ~]# ceph -v
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
(stable)
Is this a known issue?
It's not exactly a known issue, but from the output and story you've got here it looks like the OSDs are deleting the snapshot data too fast and the MDS isn't getting quick enough replies? Or maybe you have an overlarge CephFS directory which is taking a long time to clean up somehow; you should get the MDS ops and the MDS' objecter ops in flight and see what specifically is taking so long.
-Greg
Cheers,
Kenneth
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com