Re: Strange container restarts?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Things are a little more complex. Managed (container) resources are handled by systemd, which typically auto-restart failed services. Ceph diverts the container logs (what you'd get from "docker logs container-id") into the systemd journal log. So doing a journalctl check is advised. Although crashing containers have a lamentable tendency not to log what took them down.

Some subsystems within a container have their own loggers which can be configured. I think that includes, for example, Prometheus. In which case, it's important ensure that the location that they're set to log to is OUTSIDE the container, as otherwise they'll log to a file inside the container image, and the image is destroyed when the container terminates and thus the evidence will be logs.

This is, of course, exempting problems with the containers themselves. It's always prudent to ensure that there's plenty of spare RAM for the container to run in and that the root filesystem ("/") has enough free space to hold the generated images. Which can be potentially quite large.

    Tim

On 11/12/24 05:59, Eugen Block wrote:
I don't see osd related exec_died messages in Pacific, but on Quincy they are also logged. But I can simply trigger it with a 'cephadm ls', so it's just the regular check, no need to worry about that. It's not triggered though if you only run 'cephadm ls --no-detail', but one would have to look through the code to understand what exactly the full ls command queries. But as I wrote, this isn't an issue, just a regular check.

Zitat von Eugen Block <eblock@xxxxxx>:

Hi,

I haven't looked too deep into it yet, but I think it's the regular cephadm check. The timestamps should match those in the /var/log/ceph/cephadm.log, where you can see something like that:

cephadm ['--image', '{YOUR_REGISTRY}', 'ls']

It goes through your inventory and runs several 'gather-facts' commands and a couple more. I don't think you need to worry about this.

Regards,
Eugen

Zitat von Jan Marek <jmarek@xxxxxx>:

Hello,

we have ceph cluster which consists of 12 host, on every host we
have 12 NVMe "disks".

On most of these host (9 of 12) we have in logs errors, see
attached file.

We tried to check this problem, and we have these points:

1) On every host there is only one OSD. Thus it's not problem in
version 18.2.2 generally, because there will be on another OSD,
not only one of host?

2) Sometimes one of this OSD crashed :-( It seems, that crashed
OSD are from set of OSDs, which have this problem.

3) ceph cluster goes OK and it "doesn't know" about any problem
with these OSD. It seem's, that this new instance of ceph-osd
daemon tried to start either podman or conmon itself. We've tried
to control PID files for conman, but they're seems to be OK?

4) We tried to check 'ceph orch' command, but it does not try to
start these containers, because it know, that they exists and run
('ceph orch ps' list these containers as running).

5) I've tried to pause ochestrator, but I've still found in syslog
these entries... :-(

Please, is there any possibility to find out, where is problem
and stop this?

We have all of the ceph host prepared by ansible, thus there is
the same environment.

On every machine we have podman version 4.3.1+ds1-8+deb12u1 and
conmon version 2.1.6+ds1-1. OS is Debian bookworm.

Attached logs was prepared by:

grep exec_died /var/log/syslog

Sincerely
Jan Marek
--
Ing. Jan Marek
University of South Bohemia
Academic Computer Centre
Phone: +420389032080
http://www.gnu.org/philosophy/no-word-attachments.cs.html


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux