Re: ceph-ansible in Pacific and beyond?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/18/21 9:09 AM, Janne Johansson wrote:
Den ons 17 mars 2021 kl 20:17 skrev Matthew H <matthew.heler@xxxxxxxxxxx>:

"A containerized environment just makes troubleshooting more difficult, getting access and retrieving details on Ceph processes isn't as straightforward as with a non containerized infrastructure. I am still not convinced that containerizing everything brings any benefits except the collocation of services."

It changes the way you troubleshoot, but I don't find it more difficult in the issues I have seen and had. Even today without containers, all services can be co-located within the same hosts (mons,mgrs,osds,mds).. Is there a situation you've seen where that has not been the case?

New ceph users pop in all the time on the #ceph IRC and have
absolutely no idea on how to see the relevant logs from the
containerized services.

While you might not need that much Ceph knowledge to get Ceph up and running, it does require users to know how container deployments work. I had to put quite a bit of work in to get what ceph-ansible was doing to deploy the containers, and why it would fail (after some other tries). You do need to have Ceph knowledge, still, when things do not go as expected, and even beforehand to make the right decisions on how to set up all the infrastructure. So arguably you need even more knowledge to understand what is going on under the hood, be it Ceph or containers.


Me being one of the people that do run services on bare metal (and
VMs) I actually can't help them, and it seems several other old ceph
admins can't either.

Not that it is impossible or might not even be hard to get them, but
somewhere in the "it is so easy to get it up and running, just pop a
container and off you go" docs there seem to be a lack of the parts
"when the OSD crashes at boot, run this to export the file normally
called /var/log/ceph/ceph-osd.12.log" meaning it becomes a black box
to the users and they are left to wipe/reinstall or something else
when it doesn't work. At the end, I guess the project will see less
useful reports with Assert Failed logs from impossible conditions and
more people turning away from something that could be fixed in the
long run.

There is a ceph manager module for that: https://docs.ceph.com/en/latest/mgr/crash/

I guess an option to "always send crash logs to Ceph" could be build in. If you trust Ceph with this data of course (opt-in).



I get some of the advantages, and for stateless services elsewhere it
might be gold to have containers, I am not equally enthusiastic about
it for ceph.


Yeah, so I think it's good to discuss pros and cons and see what problem it solves, and what extra problems it creates.

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux