I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well.
Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have.
If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again.
I have the mds log at http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log
For the possibly, but not necessarily, useful background info.
- Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up.
- We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem.
- Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com