Re: Random CephFS freeze, osd bad authorize reply

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Are the clocks dramatically out of sync? Basically any bug in signing could cause that kind of log message, but u think simple time sync so they're using different keys is the most common.
On Mon, Jul 24, 2017 at 9:36 AM <topro@xxxxxx> wrote:
Hi,
 
I'm running a Ceph cluster which I started back in bobtail age and kept it running/upgrading over the years. It has three nodes, each running one MON, 10 OSDs and one MDS. The cluster has one MDS active and two standby. Machines are 8-core Opterons with 32GB of ECC RAM each. I'm using it to host our clients (about 25) /home using CephFS and as a RBD Backend for a couple of libvirt VMs (about 5).
 
Currently I'm running 11.2.0 (kraken) and a couple of month ago I started experiencing some strange behaviour. Exactly 2 of my ~25 CephFS Clients (always the same two) keep freezing their /home about 1 or two hours after first boot in the morning. At the moment of freeze, syslog starts reporting loads of:
 
_hostname_ kernel: libceph: osdXX 172.16.0.XXX:68XX bad authorize reply
 
On one of the clients I replaced every single piece of hardware with new hardware, so that machine is completely replaced now including NIC, Switch, Network-Cabling and did a complete OS reinstall. But the user is still getting that behaviour. As far as I could get, it seems that key renegotiation is failing and client tries to keep connecting with old cephx key. But I cannot find a reason for why this is happening and how to fix it.
 
Biggest problem, the second affected machine is the one of our CEO and if we won't fix it I will have a hard time explaining that Ceph is the way to go.
 
The two affected machines do not share any common piece of network segment other than TOR-Switch in Ceph Rack, while there are other clients that do share network segment with affected machines but arent affected at all.
 
Google won't help me either on this one, seems no one else is experiencing something similar.
 
Client setup on all clients is Debian Jessie with 4.9 Backports kernel, using kernel client for mounting CephFS. I think the whole thing started with a kernel upgrade from one 4.X series to another, but cannout reconstruct.
 
Any help greatly appreciated.
 
Best regards,
Tobi
 
 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux