Hi Andrew,
we have had bad experiences with ubuntu's auto update, especially when
updating packages from systemd,dbus and docker.
for example: one effect was internal communication errors, only a
restart of the node helped.
Cheers, Joachim
___________________________________
Clyso GmbH - Ceph Foundation Member
support@xxxxxxxxx
https://www.clyso.com
Am 07.08.2021 um 11:04 schrieb Andrew Walker-Brown:
Thanks David,
Spent some more time digging in the logs/google. Also had a further 2 nodes fail this morning (different nodes).
Looks like it’s related to apt-auto updates on Ubuntu 20.04, although we don’t run unattended upgrades. Docker appears to get a terminate signal which shutsdown/restarts all the containers but some don’t come back cleanly. There’s was also some legacy unused interfaces/bonds in the netplan config.
Anyway, cleaned all that up...so hopefully it’s resolved.
Cheers,
A.
Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
From: David Caro<mailto:dcaro@xxxxxxxxxxxxx>
Sent: 06 August 2021 09:20
To: Andrew Walker-Brown<mailto:andrew_jbrown@xxxxxxxxxxx>
Cc: Marc<mailto:Marc@xxxxxxxxxxxxxxxxx>; ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
Subject: Re: Re: All OSDs on one host down
On 08/06 07:59, Andrew Walker-Brown wrote:
Hi Marc,
Yes i’m probably doing just that.
The ceph admin guides aren’t exactly helpful on this. The cluster was deployed using cephadm and it’s been running perfectly until now.
Wouldn’t running “journalctl -u ceph-osd@5” on host ceph-004 show me the logs for osd.5 on that host?
On my containerized setup, the services that cephadm created are:
dcaro@node1:~ $ sudo systemctl list-units | grep ceph
ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@crash.node1.service loaded active running Ceph crash.node1 for d49b287a-b680-11eb-95d4-e45f010c03a8
ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@mgr.node1.mhqltg.service loaded active running Ceph mgr.node1.mhqltg for d49b287a-b680-11eb-95d4-e45f010c03a8
ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@mon.node1.service loaded active running Ceph mon.node1 for d49b287a-b680-11eb-95d4-e45f010c03a8
ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@osd.3.service loaded active running Ceph osd.3 for d49b287a-b680-11eb-95d4-e45f010c03a8
ceph-d49b287a-b680-11eb-95d4-e45f010c03a8@osd.7.service loaded active running Ceph osd.7 for d49b287a-b680-11eb-95d4-e45f010c03a8
system-ceph\x2dd49b287a\x2db680\x2d11eb\x2d95d4\x2de45f010c03a8.slice loaded active active system-ceph\x2dd49b287a\x2db680\x2d11eb\x2d95d4\x2de45f010c03a8.slice
ceph-d49b287a-b680-11eb-95d4-e45f010c03a8.target loaded active active Ceph cluster d49b287a-b680-11eb-95d4-e45f010c03a8
ceph.target loaded active active All Ceph clusters and services
where the string after 'ceph-' is the fsid of the cluster.
Hope that helps (you can use the systemctl list-units also to search the specific ones on yours).
Cheers,
A
Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
From: Marc<mailto:Marc@xxxxxxxxxxxxxxxxx>
Sent: 06 August 2021 08:54
To: Andrew Walker-Brown<mailto:andrew_jbrown@xxxxxxxxxxx>; ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
Subject: RE: All OSDs on one host down
I’ve tried restarting on of the osds but that fails, journalctl shows
osd not found.....not convinced I’ve got the systemctl command right.
You are not mixing 'not container commands' with 'container commands'. As in, if you execute this journalctl outside of the container it will not find anything of course.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3
"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx