On Fri, Nov 9, 2018 at 2:24 AM Kenneth Waegeman < kenneth.waegeman@xxxxxxxx> wrote: Hi all,
On Mimic 13.2.1, we are seeing blocked ops on cephfs after removing some
snapshots:
[root@osd001 ~]# ceph -s
cluster:
id: 92bfcf0a-1d39-43b3-b60f-44f01b630e47
health: HEALTH_WARN
5 slow ops, oldest one blocked for 1162 sec, mon.mds03 has
slow ops
services:
mon: 3 daemons, quorum mds01,mds02,mds03
mgr: mds02(active), standbys: mds03, mds01
mds: ceph_fs-2/2/2 up {0=mds03=up:active,1=mds01=up:active}, 1
up:standby
osd: 544 osds: 544 up, 544 in
io:
client: 5.4 KiB/s wr, 0 op/s rd, 0 op/s wr
[root@osd001 ~]# ceph health detail
HEALTH_WARN 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has
slow ops
SLOW_OPS 5 slow ops, oldest one blocked for 1327 sec, mon.mds03 has slow ops
[root@osd001 ~]# ceph -v
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
(stable)
Is this a known issue?
It's not exactly a known issue, but from the output and story you've got here it looks like the OSDs are deleting the snapshot data too fast and the MDS isn't getting quick enough replies? Or maybe you have an overlarge CephFS directory which is taking a long time to clean up somehow; you should get the MDS ops and the MDS' objecter ops in flight and see what specifically is taking so long. -Greg
We had a similar issue on ceph 10.2 and RBD images. It was fixed by slowing down snapshot removal by adding this to the ceph.conf.
[osd] osd snap trim sleep = 0.6
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com