Re: mds crashing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15/04/2015 20:02, Kyle Hutson wrote:
I upgraded to 0.94.1 from 0.94 on Monday, and everything had been going pretty well.

Then, about noon today, we had an mds crash. And then the failover mds crashed. And this cascaded through all 4 mds servers we have.

If I try to start it ('service ceph start mds' on CentOS 7.1), it appears to be OK for a little while. ceph -w goes through 'replay' 'reconnect' 'rejoin' 'clientreplay' and 'active' but nearly immediately after getting to 'active', it crashes again.

I have the mds log at http://people.beocat.cis.ksu.edu/~kylehutson/ceph-mds.hobbit01.log <http://people.beocat.cis.ksu.edu/%7Ekylehutson/ceph-mds.hobbit01.log>

For the possibly, but not necessarily, useful background info.
- Yesterday we took our erasure coded pool and increased both pg_num and pgp_num from 2048 to 4096. We still have several objects misplaced (~17%), but those seem to be continuing to clean themselves up. - We are in the midst of a large (300+ TB) rsync from our old (non-ceph) filesystem to this filesystem. - Before we realized the mds crashes, we had just changed the size of our metadata pool from 2 to 4.

It looks like you're seeing http://tracker.ceph.com/issues/10449, which is a situation where the SessionMap object becomes too big for the MDS to save.The cause of it in that case was stuck requests from a misbehaving client running a slightly older kernel.

Assuming you're using the kernel client and having a similar problem, you could try to work around this situation by forcibly unmounting the clients while the MDS is offline, such that during clientreplay the MDS will remove them from the SessionMap after timing out, and then next time it tries to save the map it won't be oversized. If that works, you could then look into getting newer kernels on the clients to avoid hitting the issue again -- the #10449 ticket has some pointers about which kernel changes were relevant.

Cheers,
John
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux