On 15/04/2015 20:02, Kyle Hutson wrote:
I upgraded to 0.94.1 from 0.94 on Monday, and everything had been
going pretty well.
Then, about noon today, we had an mds crash. And then the failover mds
crashed. And this cascaded through all 4 mds servers we have.
If I try to start it ('service ceph start mds' on CentOS 7.1), it
appears to be OK for a little while. ceph -w goes through 'replay'
'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly
immediately after getting to 'active', it crashes again.
I have the mds log at
http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log
<http://people.beocat.cis.ksu.edu/%7Ekylehutson/ceph-mds.hobbit01.log>
For the possibly, but not necessarily, useful background info.
- Yesterday we took our erasure coded pool and increased both pg_num
and pgp_num from 2048 to 4096. We still have several objects misplaced
(~17%), but those seem to be continuing to clean themselves up.
- We are in the midst of a large (300+ TB) rsync from our old
(non-ceph) filesystem to this filesystem.
- Before we realized the mds crashes, we had just changed the size of
our metadata pool from 2 to 4.
It looks like you're seeing http://tracker.ceph.com/issues/10449, which
is a situation where the SessionMap object becomes too big for the MDS
to save.The cause of it in that case was stuck requests from a
misbehaving client running a slightly older kernel.
Assuming you're using the kernel client and having a similar problem,
you could try to work around this situation by forcibly unmounting the
clients while the MDS is offline, such that during clientreplay the MDS
will remove them from the SessionMap after timing out, and then next
time it tries to save the map it won't be oversized. If that works, you
could then look into getting newer kernels on the clients to avoid
hitting the issue again -- the #10449 ticket has some pointers about
which kernel changes were relevant.
Cheers,
John
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com