Blocked requests during and after CephFS delete

Oliver Schulz <oschulz@xxxxxxxxxx> · Sun, 08 Dec 2013 16:16:36 +0100

Hello Ceph-Gurus,

a short while ago I reported some trouble we had with our cluster
suddenly going into a state of "blocked requests".

We did a few tests, and we can reproduce the problem:
During / after deleting of a substantial chunk of data on
CephFS (a few TB), ceph health shows blocked requests like

    HEALTH_WARN 222 requests are blocked > 32 sec

This goes on for a couple of minutes, during which the cluster is
pretty much unusable. The number of blocked requests jumps around
(but seems to go down on average), until finally (after about 15
minutes in my last test) health is back to OK.

I upgraded the cluster to Ceph emperor (0.72.1) and repeated the
test, but the problem persists.

Is this normal - and if not, what might be the reason? Obviously,
having the cluster go on strike for a while after data deletion
is a bit of a problem, especially with a mixed application load.
The VM's running on RBDs aren't too happy about it, for example. ;-)

Our cluster structure: 6 Nodes, 6x 3TB disks plus 1x System/Journal
SSD per node, one OSD per disk. We're running ceph version
0.72.1-1precise on Ubuntu 12.04.3 with kernel 3.8.0-33-generic
(x86_64). All active pools use replication factor 3.

Any ideas?

Cheers,

Oliver
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com