We run deep scrubs via cron with a script so we know when deep scrubs are happening, and we've seen nodes fail both during deep scrubbing and while no deep scrubs are occurring so I'm pretty sure its not related.
On Tue, Dec 8, 2015 at 2:42 AM, Benedikt Fraunhofer <fraunhofer@xxxxxxxxxx> wrote:
Hi Tom,
2015-12-08 10:34 GMT+01:00 Tom Christensen <pavera@xxxxxxxxx>:
> We didn't go forward to 4.2 as its a large production cluster, and we just
> needed the problem fixed. We'll probably test out 4.2 in the next couple
unfortunately we don't have the luxury of a test cluster.
and to add to that, we couldnt simulate the load, altough it does not
seem to be load related.
Did you try running with nodeep-scrub as a short-term workaround?
I'll give ~30% of the nodes 4.2 and see how it goes.
> In our experience it takes about 2 weeks to start happening
we're well below that. Somewhat between 1 and 4 days.
And yes, once one goes south, it affects the rest of the cluster.
Thx!
Benedikt
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com