rbd client affected with only one node down

Udo Lembke <ulembke@xxxxxxxxxxxx> · Tue, 21 Jan 2014 19:01:17 +0100

Hi,
I need a little bit help.
We have an 4-node ceph cluster and the clients run in trouble if one
node is down (due to maintenance).

After the node is switched on again ceph health shows (for a little time):
HEALTH_WARN 4 pgs incomplete; 14 pgs peering; 370 pgs stale; 12 pgs
stuck unclean; 36 requests are blocked > 32 sec; nodown flag(s) set

nodown is set due to maintenance and in the global section of ceph.conf
is following defined to protect for such things:
osd pool default min size = 1 # Allow writing one copy in a degraded state.

And in the logfile I see messages like:
2014-01-21 18:00:18.566712 osd.46 172.20.2.14:6821/12805 17 : [WRN] 6
slow requests, 3 included below; oldest blocked for > 180.734141 secs
2014-01-21 18:00:18.566717 osd.46 172.20.2.14:6821/12805 18 : [WRN] slow
request 120.523231 seconds old, received at 2014-01-21

Due to the message:
2014-01-21 18:00:21.126693 mon.0 172.20.2.11:6789/0 410241 : [INF] pgmap
v8331119: 4808 pgs: 4805 active+clean, 1 active+clean+scrubbing, 2
active+clean+scrubbing+deep; 57849 GB data, 113 TB used, 77841 GB / 189
TB avail; 2304 B/s wr, 0 op/s
I assume it's has someting to do with scrubbing and not writing from the
VMs?

Are there any switches which protect for this behavior?

regards

Udo
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com