Re: All OSDs on one host down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Indeed if you upgrade Docker such as with the APT unattended-upgrades the Docker daemon will get restarted meaning all your containers too :( That's just how Docker works.

You might want to switch to podman instead of Docker in order to avoid that. I use podman precisely for this reason.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Saturday, August 7th, 2021 at 11:04 AM, Andrew Walker-Brown <andrew_jbrown@xxxxxxxxxxx> wrote:

> Thanks David,
>
> Spent some more time digging in the logs/google. Also had a further 2 nodes fail this morning (different nodes).
>
> Looks like it’s related to apt-auto updates on Ubuntu 20.04, although we don’t run unattended upgrades. Docker appears to get a terminate signal which shutsdown/restarts all the containers but some don’t come back cleanly. There’s was also some legacy unused interfaces/bonds in the netplan config.
>
> Anyway, cleaned all that up...so hopefully it’s resolved.
>
> Cheers,
>
> A.
>
> Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows 10
>
> From: David Caromailto:dcaro@xxxxxxxxxxxxx
>
> Sent: 06 August 2021 09:20
>
> To: Andrew Walker-Brownmailto:andrew_jbrown@xxxxxxxxxxx
>
> Cc: Marcmailto:Marc@xxxxxxxxxxxxxxxxx; ceph-users@ceph.iomailto:ceph-users@xxxxxxx
>
> Subject: Re:  Re: All OSDs on one host down
>
> On 08/06 07:59, Andrew Walker-Brown wrote:
>
> > Hi Marc,
> >
> > Yes i’m probably doing just that.
> >
> > The ceph admin guides aren’t exactly helpful on this. The cluster was deployed using cephadm and it’s been running perfectly until now.
> >
> > Wouldn’t running “journalctl -u ceph-osd@5” on host ceph-004 show me the logs for osd.5 on that host?
>
> On my containerized setup, the services that cephadm created are:
>
> dcaro@node1:~ $ sudo systemctl list-units | grep ceph
>
> ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@crash.node1.service loaded active running Ceph crash.node1 for d49b287a-b680-11eb-95d4-e45f010c03a8
>
> ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@mgr.node1.mhqltg.service loaded active running Ceph mgr.node1.mhqltg for d49b287a-b680-11eb-95d4-e45f010c03a8
>
> ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@mon.node1.service loaded active running Ceph mon.node1 for d49b287a-b680-11eb-95d4-e45f010c03a8
>
> ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@osd.3.service loaded active running Ceph osd.3 for d49b287a-b680-11eb-95d4-e45f010c03a8
>
> ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@osd.7.service loaded active running Ceph osd.7 for d49b287a-b680-11eb-95d4-e45f010c03a8
>
> system-ceph\x2dd49b287a\x2db680\x2d11eb\x2d95d4\x2de45f010c03a8.slice loaded active active system-ceph\x2dd49b287a\x2db680\x2d11eb\x2d95d4\x2de45f010c03a8.slice
>
> ceph-d49b287a-b680-11eb-95d4-e45f010c03a8.target loaded active active Ceph cluster d49b287a-b680-11eb-95d4-e45f010c03a8
>
> ceph.target loaded active active All Ceph clusters and services
>
> where the string after 'ceph-' is the fsid of the cluster.
>
> Hope that helps (you can use the systemctl list-units also to search the specific ones on yours).
>
> > Cheers,
> >
> > A
> >
> > Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows 10
> >
> > From: Marcmailto:Marc@xxxxxxxxxxxxxxxxx
> >
> > Sent: 06 August 2021 08:54
> >
> > To: Andrew Walker-Brownmailto:andrew_jbrown@xxxxxxxxxxx; ceph-users@ceph.iomailto:ceph-users@xxxxxxx
> >
> > Subject: RE: All OSDs on one host down
> >
> > > I’ve tried restarting on of the osds but that fails, journalctl shows
> > >
> > > osd not found.....not convinced I’ve got the systemctl command right.
> >
> > You are not mixing 'not container commands' with 'container commands'. As in, if you execute this journalctl outside of the container it will not find anything of course.
> >
> > ceph-users mailing list -- ceph-users@xxxxxxx
> >
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> David Caro
>
> SRE - Cloud Services
>
> Wikimedia Foundation https://wikimediafoundation.org/
>
> PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3
>
> "Imagine a world in which every single human being can freely share in the
>
> sum of all knowledge. That's our commitment."
>
> ceph-users mailing list -- ceph-users@xxxxxxx
>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux