Hi, I need a little bit help. We have an 4-node ceph cluster and the clients run in trouble if one node is down (due to maintenance). After the node is switched on again ceph health shows (for a little time): HEALTH_WARN 4 pgs incomplete; 14 pgs peering; 370 pgs stale; 12 pgs stuck unclean; 36 requests are blocked > 32 sec; nodown flag(s) set nodown is set due to maintenance and in the global section of ceph.conf is following defined to protect for such things: osd pool default min size = 1 # Allow writing one copy in a degraded state. And in the logfile I see messages like: 2014-01-21 18:00:18.566712 osd.46 172.20.2.14:6821/12805 17 : [WRN] 6 slow requests, 3 included below; oldest blocked for > 180.734141 secs 2014-01-21 18:00:18.566717 osd.46 172.20.2.14:6821/12805 18 : [WRN] slow request 120.523231 seconds old, received at 2014-01-21 Due to the message: 2014-01-21 18:00:21.126693 mon.0 172.20.2.11:6789/0 410241 : [INF] pgmap v8331119: 4808 pgs: 4805 active+clean, 1 active+clean+scrubbing, 2 active+clean+scrubbing+deep; 57849 GB data, 113 TB used, 77841 GB / 189 TB avail; 2304 B/s wr, 0 op/s I assume it's has someting to do with scrubbing and not writing from the VMs? Are there any switches which protect for this behavior? regards Udo _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com