We didn't go forward to 4.2 as its a large production cluster, and we just needed the problem fixed. We'll probably test out 4.2 in the next couple months, but this one slipped past us as it didn't occur in our test cluster until after we had upgraded production. In our experience it takes about 2 weeks to start happening, but once it does its all hands on deck cause nodes are going to go down regularly.
All that being said, if/when we try 4.2 its going to need to run for 1-2 months rock solid in our test cluster before it gets to production.
On Tue, Dec 8, 2015 at 2:30 AM, Benedikt Fraunhofer <fraunhofer@xxxxxxxxxx> wrote:
Hi Tom,
> We have been seeing this same behavior on a cluster that has been perfectly
> happy until we upgraded to the ubuntu vivid 3.19 kernel. We are in the
i can't recall when we gave 3.19 a shot but now that you say it... The
cluster was happy for >9 months with 3.16.
Did you try 4.2 or do you think the regression from 3.16 introduced
somewhere trough 3.19 is still in 4.2?
Thx!
Benedikt
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com