Continuing Ceph Issues with OSDs falling over

Peter Childs <pchilds@xxxxxxx> · Wed, 7 Jul 2021 11:34:55 +0100

I'm still attempting to build a ceph cluster and I'm currently getting
nowhere very very quickly. From what I can tell I have a slightly unstable
setup and I'm yet to work out why.

I currently have 24 servers and I'm planning to increase this to around 48
These servers are in three groups with different types of disks (and
number) in each type.

Currently I'm having an issue where every time I add a new server it adds
the osd on the node and then a few random ods on the current hosts will all
fall over and I'll only be able to get them up again by restart the daemons.

I'm using cephadm. and the network is a QDR based IB network running IP
over IB so its meant to be 40G but currently is behaving more like 10G
(when I've tested it) Its still faster than the 1G management network I've
also got.

The machines are mostly running debian. There are a few machine running
CentOS7 I'm meaning to redeploy when I get the time (so I can upgrade to
Pacific)

I'm running Octopus 15.2.13, I'm more than happy to change stuff I'm still
trying to learn stuff so there is no data that I care about quite yet, I
was looking for more stability before I go there.

I really just want to know where to look for the problems rather than any
exact answers, I'm yet to see any clues that might help

Thanks in advance

Peter Childs
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx