Rogue osd / CephFS / Adding osd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all !

We are facing strange behaviors from two clusters we have at work (both v15.2.9 / CentOS 7.9):


  *   In the 1st cluster we are getting errors about multiple degraded pgs and all of them are linked with a "rogue" osd which ID is very big (as "osd.2147483647"). This osd doesn't show with "ceph osd tree" and what is even weirder is that it doesn't always appear (about every 5/10 minutes)... but when it does, a lot of pgs get degraded.
  *
  *   In the 2nd cluster we are serving CephFS and after some users complaints, we saw that ceph-fuse is trying to connect to some osds on the wrong network (cluster network instead of public network). This behavior is random, about 90% of ceph-fuse connections to osds are on the public network but the rest try to access the osds through the cluster network. As the cluster network is not reachable from the clients, this make the connections go stale and the only way to recover from this is to "kill -9" the ceph-fuse mount.
  *
  *   Last thing we are facing on both clusters is when we add a new osd, half the time another one goes down on another server and the only way to make it back up again is to reweight it to 0, zap it and readd it (which can then lead to another osd failing...)

Any insights, suggestions, feedback would be greatly appreciated !

Best regards,

--
Thierry
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux