I'm new to ceph, and I've been trying to set up a new cluster with 16 computers with 30 disks each and 6 SSD (plus boot disks), 256G of memory, IB Networking. (ok its currently 15 but never mind) When I take them over about 10 OSD's each they start having problems starting the OSD up and I can normally fix this by rebooting them and it will continue again for a while, and it is possible to get them up to the full complement with a bit of poking around. (Once its working it fne unless you start adding services or moving the OSD's around Is there anything I can change to make it a bit more stable. I've already set fs.aio-max-nr = 1048576 kernel.pid_max = 4194303 fs.file-max = 500000 which made it a bit better, but I feel it could be even better. I'm currently trying to upgrade to 15.2.9 from the default cephadm version of octopus. The upgrade is going very very slowly. I'm currently using podman if that helps, I'm not sure if docker would be better? (I've mainly used singularity when I've handled containers before) Thanks in advance Peter Childs _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx