Re: How to troubleshoot monitor node

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 在 2022年1月11日,00:19,Andre Tann <atann@xxxxxxxxxxxx> 写道:
> 
> Hi Janne,
> 
>> On 10.01.22 16:49, Janne Johansson wrote:
>> 
>> Well, nc would not tell you if a bad (local or remote) firewall
>> configuration prevented nc (and ceph -s) from connecting, it would
>> give the same results as if the daemon wasn't listening at all, so
>> that is why I suggested checking if the port was listening or not,
>> instead of doing a wide test that may have multiple causes for same
>> failure.
> 
> Understood, you're right.
> 
> 
>> so if 192.168.14.48 indeed is a mon host, then "systemctl" should show
>> if the ceph-mon@<my-hostname> is running or not, 
> 
> 
> root@mon01:~# systemctl status ceph-mon@mon01
> Unit ceph-mon@mon01.service could not be found.
> 
> But here I see all are running:
> 
> root@mon01:~# systemctl list-units --type=service | grep ceph
> ceph-b61400fe-6e25-11ec-b322-896f8c260566@alertmanager.mon01.service
>  loaded active running Ceph alertmanager.mon01 for b61400fe-[...]
> ceph-b61400fe-6e25-11ec-b322-896f8c260566@crash.mon01.service
>  loaded active running Ceph crash.mon01 for b61400fe-[...]
> ceph-b61400fe-6e25-11ec-b322-896f8c260566@grafana.mon01.service
>  loaded active running Ceph grafana.mon01 for b61400fe-[...]
> ceph-b61400fe-6e25-11ec-b322-896f8c260566@mgr.mon01.wjrbzl.service
>  loaded active running Ceph mgr.mon01.wjrbzl for b61400fe-[...]
> ceph-b61400fe-6e25-11ec-b322-896f8c260566@node-exporter.mon01.service
>  loaded active running Ceph node-exporter.mon01 for b61400fe-[...]
> ceph-b61400fe-6e25-11ec-b322-896f8c260566@prometheus.mon01.service
>  loaded active running Ceph prometheus.mon01 for b61400fe-[...]
> 
> (Reformatted it to be more readable)

So this cluster is deployed with cephadm. Please use

systemctl status ceph-b61400fe-6e25-11ec-b322-896f8c260566@mon.mon01.service

> > and logs from
> > /var/log/ceph/ceph-mon-hostname.log should indicate why it will not
> > start or why it is not running currently.
> 
> Doesn't exist:
> 
> root@mon01:~# ls -l /var/log/ceph/
> total 988
> drwxrwx--- 2  167  167   4096 Jan  5 13:50 b61400fe-[...]
> -rw-r--r-- 1 ceph ceph 872301 Jan 10 16:57 cephadm.log
> -rw-r--r-- 1 ceph ceph 123769 Jan  5 13:50 cephadm.log.1

With cephadm, you can find the logs with

sudo journalctl -u ceph-b61400fe-6e25-11ec-b322-896f8c260566@mon.mon01.service

> Inside the b61400fe-... subdir there is only ceph-volume.log. This now makes me curious:
> 
> ceph_volume.exceptions.ConfigurationError: Unable to load expected Ceph config at: /etc/ceph/ceph.conf
> 
> But ceph.conf exists:
> root@mon01:~# ls -l /etc/ceph/ceph.conf
> -rw-r--r-- 1 root root 177 Jan  5 13:49 /etc/ceph/ceph.conf
> 
> Why so? The file is 644, /etc/ceph is 755, as is /etc, so the file is world readable.

The log may refer to the path in the container, not that on the host. With cephadm, almost everything is run in containers.

> I googled of course, and found a bug report [1], but no further info about it.
> Others seem to have the same issue, but it is not clear to me how to track down what's up & how to fix it.
> 
> 
> [1] https://tracker.ceph.com/issues/47633
> 
> -- 
> Andre Tann
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux