CephFS Segfault 12.2.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have a recent cluster upgraded from Jewel to Luminous.  Today we had
a segmentation fault that led to file system degraded.  Systemd then
decided to restart the daemon over and over with a different stack trace
(can be seen after the 10k events in the log file[0]).

After trying to fail over to the standby which also kept failing.  After
shutting down both MDSs for some time we brought one back online and
what seemed to be the clients had been out long enough to be evicted.
We were able to then reboot clients (RHEL 7.4) and have them re-connect
to the file system.

2017-09-18 13:27:12.836699 7f9c0ca51700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f9c0ca51700 thread_name:fn_anonymous

 ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
(rc)
 1: (()+0x590c21) [0x55a40867ac21]
 2: (()+0xf5e0) [0x7f9c17cb75e0]
 3:
(Server::handle_client_readdir(boost::intrusive_ptr<MDRequestImpl>&)+0xbb9)
[0x55a4083f74b9]
 4:
(Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0x9c1)
[0x55a408428591]
 5: (MDSInternalContextBase::complete(int)+0x1eb) [0x55a408605c0b]
 6: (void finish_contexts<MDSInternalContextBase>(CephContext*,
std::list<MDSInternalContextBase*,
std::allocator<MDSInternalContextBase*> >&, int)+0xac) [0x55a4083c69ac]
 7: (MDSCacheObject::finish_waiting(unsigned long, int)+0x46)
[0x55a40861d856]
 8: (Locker::eval_gather(SimpleLock*, bool, bool*,
std::list<MDSInternalContextBase*,
std::allocator<MDSInternalContextBase*> >*)+0x10df) [0x55a40851f93f]
 9: (Locker::wrlock_finish(SimpleLock*, MutationImpl*, bool*)+0x310)
[0x55a408521210]
 10: (Locker::_drop_non_rdlocks(MutationImpl*, std::set<CInode*,
std::less<CInode*>, std::allocator<CInode*> >*)+0x22c) [0x55a408524adc]
 11: (Locker::drop_non_rdlocks(MutationImpl*, std::set<CInode*,
std::less<CInode*>, std::allocator<CInode*> >*)+0x59) [0x55a4085253d9]
 12: (Server::reply_client_request(boost::intrusive_ptr<MDRequestImpl>&,
MClientReply*)+0x433) [0x55a4083f21a3]
 13: (Server::respond_to_request(boost::intrusive_ptr<MDRequestImpl>&,
int)+0x459) [0x55a4083f2dd9]
 14: (Server::_unlink_local_finish(boost::intrusive_ptr<MDRequestImpl>&,
CDentry*, CDentry*, unsigned long)+0x2ab) [0x55a4083fd7fb]
 15: (MDSIOContextBase::complete(int)+0xa4) [0x55a408605d44]
 16: (MDSLogContextBase::complete(int)+0x3c) [0x55a4086060fc]
 17: (Finisher::finisher_thread_entry()+0x198) [0x55a4086ba718]
 18: (()+0x7e25) [0x7f9c17cafe25]
 19: (clone()+0x6d) [0x7f9c16d9234dC


[0] -
https://obj.umiacs.umd.edu/derek_support/mds_20170918/ceph-mds.objmds01.log?Signature=VJB4qL34j5UKM%2BCxeiR8n0JA1gE%3D&Expires=1508357409&AWSAccessKeyId=936291C3OMB2LBD7FLK4

-- 
Derek T. Yarnell
Director of Computing Facilities
University of Maryland
Institute for Advanced Computer Studies
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux