Hi Eugen.
Sorry for the double email, but now it stoped complaining (too much) about
repositories and NTP and moved forward again.
So, I ran on master:
*################*
*# salt-run state.orch ceph.stage.deployfirewall :
disabledapparmor : disabledfsid :
validpublic_network : validcluster_network :
validcluster_interface : validmonitors :
validmgrs : validstorage :
validganesha : validmaster_role :
validtime_server : validfqdn :
valid[ERROR ] {'out': 'highstate', 'ret': {'bohemia.iq.ufrgs.br
<http://bohemia.iq.ufrgs.br>':
{'file_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/var/lib/ceph/bootstrap-osd/ceph.keyring is in the correct state', 'name':
'/var/lib/ceph/bootstrap-osd/ceph.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 0, 'start_time':
'12:43:51.639582', 'duration': 40.998, '__id__':
'/var/lib/ceph/bootstrap-osd/ceph.keyring'},
'file_|-/etc/ceph/ceph.client.storage.keyring_|-/etc/ceph/ceph.client.storage.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/etc/ceph/ceph.client.storage.keyring is in the correct state', 'name':
'/etc/ceph/ceph.client.storage.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 1, 'start_time':
'12:43:51.680857', 'duration': 19.265, '__id__':
'/etc/ceph/ceph.client.storage.keyring'}, 'module_|-deploy
OSDs_|-osd.deploy_|-run': {'name': 'osd.deploy', 'changes': {}, 'comment':
'Module function osd.deploy threw an exception. Exception: Mine on
bohemia.iq.ufrgs.br <http://bohemia.iq.ufrgs.br> for cephdisks.list',
'result': False, '__sls__': 'ceph.osd.default', '__run_num__': 2,
'start_time': '12:43:51.701179', 'duration': 38.789, '__id__': 'deploy
OSDs'}}, 'torcello.iq.ufrgs.br <http://torcello.iq.ufrgs.br>':
{'file_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/var/lib/ceph/bootstrap-osd/ceph.keyring is in the correct state', 'name':
'/var/lib/ceph/bootstrap-osd/ceph.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 0, 'start_time':
'12:43:51.768119', 'duration': 39.544, '__id__':
'/var/lib/ceph/bootstrap-osd/ceph.keyring'},
'file_|-/etc/ceph/ceph.client.storage.keyring_|-/etc/ceph/ceph.client.storage.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/etc/ceph/ceph.client.storage.keyring is in the correct state', 'name':
'/etc/ceph/ceph.client.storage.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 1, 'start_time':
'12:43:51.807977', 'duration': 16.645, '__id__':
'/etc/ceph/ceph.client.storage.keyring'}, 'module_|-deploy
OSDs_|-osd.deploy_|-run': {'name': 'osd.deploy', 'changes': {}, 'comment':
'Module function osd.deploy threw an exception. Exception: Mine on
torcello.iq.ufrgs.br <http://torcello.iq.ufrgs.br> for cephdisks.list',
'result': False, '__sls__': 'ceph.osd.default', '__run_num__': 2,
'start_time': '12:43:51.825744', 'duration': 39.334, '__id__': 'deploy
OSDs'}}, 'patricia.iq.ufrgs.br <http://patricia.iq.ufrgs.br>':
{'file_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/var/lib/ceph/bootstrap-osd/ceph.keyring is in the correct state', 'name':
'/var/lib/ceph/bootstrap-osd/ceph.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 0, 'start_time':
'12:43:52.039506', 'duration': 41.975, '__id__':
'/var/lib/ceph/bootstrap-osd/ceph.keyring'},
'file_|-/etc/ceph/ceph.client.storage.keyring_|-/et in
advancec/ceph/ceph.client.storage.keyring_|-managed': {'changes': {},
'pchanges': {}, 'comment': 'File /etc/ceph/ceph.client.storage.keyring is
in the correct state', 'name': '/etc/ceph/ceph.client.storage.keyring',
'result': True, '__sls__': 'ceph.osd.keyring.default', '__run_num__': 1,
'start_time': '12:43:52.081767', 'duration': 17.852, '__id__':
'/etc/ceph/ceph.client.storage.keyring'}, 'module_|-deploy
OSDs_|-osd.deploy_|-run': {'name': 'osd.deploy', 'changes': {}, 'comment':
'Module function osd.deploy threw an exception. Exception: Mine on
patricia.iq.ufrgs.br <http://patricia.iq.ufrgs.br> for cephdisks.list',
'result': False, '__sls__': 'ceph.osd.default', '__run_num__': 2,
'start_time': '12:43:52.100661', 'duration': 37.546, '__id__': 'deploy
OSDs'}}, 'original.iq.ufrgs.br <http://original.iq.ufrgs.br>':
{'file_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/var/lib/ceph/bootstrap-osd/ceph.keyring is in the correct state', 'name':
'/var/lib/ceph/bootstrap-osd/ceph.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 0, 'start_time':
'12:43:51.603233', 'duration': 42.789, '__id__':
'/var/lib/ceph/bootstrap-osd/ceph.keyring'},
'file_|-/etc/ceph/ceph.client.storage.keyring_|-/etc/ceph/ceph.client.storage.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/etc/ceph/ceph.client.storage.keyring is in the correct state', 'name':
'/etc/ceph/ceph.client.storage.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 1, 'start_time':
'12:43:51.646306', 'duration': 17.852, '__id__':
'/etc/ceph/ceph.client.storage.keyring'}, 'module_|-deploy
OSDs_|-osd.deploy_|-run': {'name': 'osd.deploy', 'changes': {}, 'comment':
'Module function osd.deploy threw an exception. Exception: Mine on
original.iq.ufrgs.br <http://original.iq.ufrgs.br> for cephdisks.list',
'result': False, '__sls__': 'ceph.osd.default', '__run_num__': 2,
'start_time': '12:43:51.665215', 'duration': 39.763, '__id__': 'deploy
OSDs'}}, 'polar.iq.ufrgs.br <http://polar.iq.ufrgs.br>':
{'file_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/var/lib/ceph/bootstrap-osd/ceph.keyring is in the correct state', 'name':
'/var/lib/ceph/bootstrap-osd/ceph.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 0, 'start_time':
'12:43:51.896018', 'duration': 41.14, '__id__':
'/var/lib/ceph/bootstrap-osd/ceph.keyring'},
'file_|-/etc/ceph/ceph.client.storage.keyring_|-/etc/ceph/ceph.client.storage.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/etc/ceph/ceph.client.storage.keyring is in the correct state', 'name':
'/etc/ceph/ceph.client.storage.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 1, 'start_time':
'12:43:51.937429', 'duration': 17.788, '__id__':
'/etc/ceph/ceph.client.storage.keyring'}, 'module_|-deploy
OSDs_|-osd.deploy_|-run': {'name': 'osd.deploy', 'changes': {}, 'comment':
'Module function osd.deploy threw an exception. Exception: Mine on
polar.iq.ufrgs.br <http://polar.iq.ufrgs.br> for cephdisks.list', 'result':
False, '__sls__': 'ceph.osd.default', '__run_num__': 2, 'start_time':
'12:43:51.956243', 'duration': 39.991, '__id__': 'deploy OSDs'}},
'pilsen.iq.ufrgs.br <http://pilsen.iq.ufrgs.br>':
{'file_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-/var/lib/ceph/bootstrap-osd/ceph.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/var/lib/ceph/bootstrap-osd/ceph.keyring is in the correct state', 'name':
'/var/lib/ceph/bootstrap-osd/ceph.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 0, 'start_time':
'12:43:51.892066', 'duration': 48.973, '__id__':
'/var/lib/ceph/bootstrap-osd/ceph.keyring'},
'file_|-/etc/ceph/ceph.client.storage.keyring_|-/etc/ceph/ceph.client.storage.keyring_|-managed':
{'changes': {}, 'pchanges': {}, 'comment': 'File
/etc/ceph/ceph.client.storage.keyring is in the correct state', 'name':
'/etc/ceph/ceph.client.storage.keyring', 'result': True, '__sls__':
'ceph.osd.keyring.default', '__run_num__': 1, 'start_time':
'12:43:51.941367', 'duration': 22.047, '__id__':
'/etc/ceph/ceph.client.storage.keyring'}, 'module_|-deploy
OSDs_|-osd.deploy_|-run': {'name': 'osd.deploy', 'changes': {}, 'comment':
'Module function osd.deploy threw an exception. Exception: Mine on
pilsen.iq.ufrgs.br <http://pilsen.iq.ufrgs.br> for cephdisks.list',
'result': False, '__sls__': 'ceph.osd.default', '__run_num__': 2,
'start_time': '12:43:51.964395', 'duration': 40.246, '__id__': 'deploy
OSDs'}}}}torcello.iq.ufrgs.br_master: Name: time - Function: salt.state -
Result: Changed Started: - 12:36:24.393407 Duration: 121265.391 ms Name:
packages - Function: salt.state - Result: Clean Started: - 12:38:25.659248
Duration: 91677.31 ms Name: configuration check - Function: salt.state -
Result: Clean Started: - 12:39:57.336932 Duration: 1045.213 ms Name:
create ceph.conf - Function: salt.state - Result: Changed Started: -
12:39:58.382450 Duration: 7148.081 ms Name: configuration - Function:
salt.state - Result: Changed Started: - 12:40:05.530875 Duration: 988.117
ms Name: admin - Function: salt.state - Result: Clean Started: -
12:40:06.519312 Duration: 1016.422 ms Name: mgr keyrings - Function:
salt.state - Result: Clean Started: - 12:40:07.536035 Duration: 1017.157
ms Name: monitors - Function: salt.state - Result: Changed Started: -
12:40:08.553496 Duration: 14750.002 ms Name: mgr auth - Function:
salt.state - Result: Changed Started: - 12:40:23.303905 Duration: 6520.062
ms Name: mgrs - Function: salt.state - Result: Changed Started: -
12:40:29.824403 Duration: 14892.393 ms Name: setup ceph exporter -
Function: salt.state - Result: Clean Started: - 12:40:44.717200 Duration:
91116.685 ms Name: setup rbd exporter - Function: salt.state - Result:
Clean Started: - 12:42:15.834030 Duration: 92481.184 ms Name: osd auth -
Function: salt.state - Result: Changed Started: - 12:43:48.315349 Duration:
1707.652 ms Name: sysctl - Function: salt.state - Result: Changed Started:
- 12:43:50.023309 Duration: 1016.997 ms---------- ID: storage
Function: salt.state Result: False Comment: Run failed on minions:
polar.iq.ufrgs.br <http://polar.iq.ufrgs.br>, torcello.iq.ufrgs.br
<http://torcello.iq.ufrgs.br>, bohemia.iq.ufrgs.br
<http://bohemia.iq.ufrgs.br>, patricia.iq.ufrgs.br
<http://patricia.iq.ufrgs.br>, pilsen.iq.ufrgs.br
<http://pilsen.iq.ufrgs.br>, original.iq.ufrgs.br
<http://original.iq.ufrgs.br> Started: 12:43:51.040746 Duration:
978.671 ms Changes: bohemia.iq.ufrgs.br
<http://bohemia.iq.ufrgs.br>:
Name: /var/lib/ceph/bootstrap-osd/ceph.keyring - Function:
file.managed - Result: Clean Started: - 12:43:51.639582 Duration: 40.998
ms Name: /etc/ceph/ceph.client.storage.keyring - Function:
file.managed - Result: Clean Started: - 12:43:51.680857 Duration: 19.265
ms ---------- ID: deploy
OSDs Function: module.run Name:
osd.deploy Result: False Comment:
Module function osd.deploy threw an exception. Exception: Mine on
bohemia.iq.ufrgs.br <http://bohemia.iq.ufrgs.br> for
cephdisks.list Started: 12:43:51.701179
Duration: 38.789 ms Changes:
Summary for bohemia.iq.ufrgs.br
<http://bohemia.iq.ufrgs.br>
------------
Succeeded: 2 Failed: 1
------------ Total states run:
3
Total run time: 99.052 ms*
*(...goes on and on and on for all other nodes, until...) *
*Summary for torcello.iq.ufrgs.br_master-------------Succeeded: 14
(changed=8)Failed: 1-------------Total states run: 15Total run
time: 447.621 s*
*################*
In the deepsea monitor (running on master) I got:
*##################*
*Starting stage: ceph.stage.deployParsing ceph.stage.deploy steps...
⏳Parsing ceph.stage.deploy steps... ✓Stage initialization
output:firewall : disabledapparmor :
disabledfsid : validpublic_network :
validcluster_network : validcluster_interface :
validmonitors : validmgrs :
validstorage : validganesha :
validmaster_role : validtime_server :
validfqdn : valid[1/71] ceph.time on
patricia.iq.ufrgs.br....................................... ✓
(94s) original.iq.ufrgs.br.......................................
✓ (92s)
bohemia.iq.ufrgs.br........................................ ✓
(83s) polar.iq.ufrgs.br..........................................
✓ (87s)
torcello.iq.ufrgs.br....................................... ✓ (1s)
pilsen.iq.ufrgs.br......................................... ✓
(121s)[2/71] ceph.packages on
patricia.iq.ufrgs.br....................................... ✓
(80s) original.iq.ufrgs.br.......................................
✓ (84s)
bohemia.iq.ufrgs.br........................................ ✓
(89s) polar.iq.ufrgs.br..........................................
✓ (91s)
torcello.iq.ufrgs.br....................................... ✓
(89s) pilsen.iq.ufrgs.br.........................................
✓ (80s)*
*(...goes on and on and on, until...)*
*[14/71] ceph.sysctl on
patricia.iq.ufrgs.br....................................... ✓
(0.5s) original.iq.ufrgs.br.......................................
✓ (0.7s)
bohemia.iq.ufrgs.br........................................ ✓
(0.6s) polar.iq.ufrgs.br..........................................
✓ (0.6s)
torcello.iq.ufrgs.br....................................... ✓
(0.5s) pilsen.iq.ufrgs.br.........................................
✓ (0.5s)[15/71] ceph.osd on
patricia.iq.ufrgs.br....................................... ❌
(0.6s) original.iq.ufrgs.br.......................................
❌ (0.6s)
bohemia.iq.ufrgs.br........................................ ❌
(0.5s) polar.iq.ufrgs.br..........................................
❌ (0.6s)
torcello.iq.ufrgs.br....................................... ❌
(0.6s) pilsen.iq.ufrgs.br.........................................
❌ (0.7s)Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71
time=472.5sFailures summary:ceph.osd (/srv/salt/ceph/osd):
bohemia.iq.ufrgs.br <http://bohemia.iq.ufrgs.br>: deploy OSDs: Module
function osd.deploy threw an exception. Exception: Mine on
bohemia.iq.ufrgs.br <http://bohemia.iq.ufrgs.br> for cephdisks.list
torcello.iq.ufrgs.br <http://torcello.iq.ufrgs.br>: deploy OSDs: Module
function osd.deploy threw an exception. Exception: Mine on
torcello.iq.ufrgs.br <http://torcello.iq.ufrgs.br> for cephdisks.list
patricia.iq.ufrgs.br <http://patricia.iq.ufrgs.br>: deploy OSDs: Module
function osd.deploy threw an exception. Exception: Mine on
patricia.iq.ufrgs.br <http://patricia.iq.ufrgs.br> for cephdisks.list
original.iq.ufrgs.br <http://original.iq.ufrgs.br>: deploy OSDs: Module
function osd.deploy threw an exception. Exception: Mine on
original.iq.ufrgs.br <http://original.iq.ufrgs.br> for cephdisks.list
polar.iq.ufrgs.br <http://polar.iq.ufrgs.br>: deploy OSDs: Module
function osd.deploy threw an exception. Exception: Mine on
polar.iq.ufrgs.br <http://polar.iq.ufrgs.br> for cephdisks.list
pilsen.iq.ufrgs.br <http://pilsen.iq.ufrgs.br>: deploy OSDs: Module
function osd.deploy threw an exception. Exception: Mine on
pilsen.iq.ufrgs.br <http://pilsen.iq.ufrgs.br> for cephdisks.list*
*##################*
Since all fail at the same position, but sometimes have something different
going on, I chose the shortest "tail -f" (among all minions) that I got:
*#################*
*# tail -f /var/log/messages > teste.polar.minion*
*# cat
teste.polar.minion
2018-08-31T12:34:18.266952-03:00 polar systemd[6856]: Stopped target
Timers.
2018-08-31T12:34:18.267459-03:00 polar systemd[6856]: Stopped target
Sockets.
2018-08-31T12:34:18.267950-03:00 polar systemd[6856]: Closed D-Bus User
Message Bus
Socket.
2018-08-31T12:34:18.268427-03:00 polar systemd[6856]: Stopped target
Paths.
2018-08-31T12:34:18.268909-03:00 polar systemd[6856]: Reached target
Shutdown.
2018-08-31T12:34:18.269396-03:00 polar systemd[6856]: Starting Exit the
Session...
2018-08-31T12:34:18.285420-03:00 polar systemd[6856]: Received SIGRTMIN+24
from PID 6899
(kill).
2018-08-31T12:34:18.313743-03:00 polar systemd[1]: Stopped User Manager for
UID
0.
2018-08-31T12:34:18.315983-03:00 polar systemd[1]: Removed slice User Slice
of
root.
2018-08-31T12:35:14.781367-03:00 polar node_exporter[734]:
time="2018-08-31T12:35:14-03:00" level=error msg="ERROR: ntp collector
failed after 0.003741s: couldn't get SNTP reply: read udp
127.0.0.1:58967->127.0.0.1:123 <http://127.0.0.1:123>: read: connection
refused"
source="collector.go:123"
2018-08-31T12:36:14.780678-03:00 polar node_exporter[734]:
time="2018-08-31T12:36:14-03:00" level=error msg="ERROR: ntp collector
failed after 0.000262s: couldn't get SNTP reply: read udp
127.0.0.1:57202->127.0.0.1:123 <http://127.0.0.1:123>: read: connection
refused"
source="collector.go:123"
2018-08-31T12:36:28.524615-03:00 polar dbus-daemon[765]: [system]
Activating service name='org.opensuse.Snapper' requested by ':1.37' (uid=0
pid=7132 comm="/usr/bin/python3 /usr/bin/salt-minion ") (using
servicehelper)2018-08-31T12:36:28.529940-03:00 polar dbus-daemon[765]:
[system] Successfully activated service
'org.opensuse.Snapper'2018-08-31T12:37:14.776684-03:00 polar
node_exporter[734]: time="2018-08-31T12:37:14-03:00" level=error
msg="ERROR: ntp collector failed after 0.000074s: couldn't get SNTP reply:
read udp 127.0.0.1:50500->127.0.0.1:123 <http://127.0.0.1:123>: read:
connection refused"
source="collector.go:123"2018-08-31T12:37:51.713246-03:00 polar
salt-minion[1421]: [WARNING ] The function "module.run" is using its
deprecated version and will expire in version
"Sodium".2018-08-31T12:37:51.721836-03:00 polar systemd[1]:
Reloading.2018-08-31T12:37:51.877175-03:00 polar systemd[1]:
nss-lookup.target: Dependency Before=nss-lookup.target
dropped2018-08-31T12:37:52.040513-03:00 polar systemd[1]:
is_symlink_with_known_name(chronyd.service, chronyd.service) →
12018-08-31T12:37:52.064225-03:00 polar systemd[1]:
is_symlink_with_known_name(chronyd.service, chronyd.service) →
12018-08-31T12:38:14.778499-03:00 polar node_exporter[734]:
time="2018-08-31T12:38:14-03:00" level=error msg="ERROR: ntp collector
failed after 0.000266s: couldn't get SNTP reply: read udp
127.0.0.1:55875->127.0.0.1:123 <http://127.0.0.1:123>: read: connection
refused" source="collector.go:123"2018-08-31T12:38:28.737138-03:00 polar
dbus-daemon[765]: [system] Activating service name='org.opensuse.Snapper'
requested by ':1.40' (uid=0 pid=8631 comm="/usr/bin/python3
/usr/bin/salt-minion ") (using
servicehelper)2018-08-31T12:38:28.742474-03:00 polar dbus-daemon[765]:
[system] Successfully activated service
'org.opensuse.Snapper'2018-08-31T12:39:14.778453-03:00 polar
node_exporter[734]: time="2018-08-31T12:39:14-03:00" level=error
msg="ERROR: ntp collector failed after 0.000279s: couldn't get SNTP reply:
read udp 127.0.0.1:49446->127.0.0.1:123 <http://127.0.0.1:123>: read:
connection refused"
source="collector.go:123"2018-08-31T12:40:11.630212-03:00 polar
dbus-daemon[765]: [system] Activating service name='org.opensuse.Snapper'
requested by ':1.43' (uid=0 pid=10146 comm="/usr/bin/python3
/usr/bin/salt-minion ") (using
servicehelper)2018-08-31T12:40:11.635544-03:00 polar dbus-daemon[765]:
[system] Successfully activated service
'org.opensuse.Snapper'2018-08-31T12:40:14.776766-03:00 polar
node_exporter[734]: time="2018-08-31T12:40:14-03:00" level=error
msg="ERROR: ntp collector failed after 0.000132s: couldn't get SNTP reply:
read udp 127.0.0.1:51148->127.0.0.1:123 <http://127.0.0.1:123>: read:
connection refused"
source="collector.go:123"2018-08-31T12:40:23.136524-03:00 polar systemd[1]:
is_symlink_with_known_name(ceph-mon@polar.service, ceph-mon@polar.service)
→ 12018-08-31T12:40:23.163808-03:00 polar systemd[1]:
is_symlink_with_known_name(ceph-mon@polar.service, ceph-mon@polar.service)
→ 12018-08-31T12:40:23.166840-03:00 polar salt-minion[1421]: [WARNING ] The
function "module.run" is using its deprecated version and will expire in
version "Sodium".2018-08-31T12:40:44.555371-03:00 polar systemd[1]:
is_symlink_with_known_name(ceph-mgr@polar.service, ceph-mgr@polar.service)
→ 12018-08-31T12:40:44.582589-03:00 polar systemd[1]:
is_symlink_with_known_name(ceph-mgr@polar.service, ceph-mgr@polar.service)
→ 12018-08-31T12:40:44.585595-03:00 polar salt-minion[1421]: [WARNING ] The
function "module.run" is using its deprecated version and will expire in
version "Sodium".2018-08-31T12:41:14.779690-03:00 polar node_exporter[734]:
time="2018-08-31T12:41:14-03:00" level=error msg="ERROR: ntp collector
failed after 0.000073s: couldn't get SNTP reply: read udp
127.0.0.1:41116->127.0.0.1:123 <http://127.0.0.1:123>: read: connection
refused" source="collector.go:123"2018-08-31T12:42:14.779059-03:00 polar
node_exporter[734]: time="2018-08-31T12:42:14-03:00" level=error
msg="ERROR: ntp collector failed after 0.000232s: couldn't get SNTP reply:
read udp 127.0.0.1:43448->127.0.0.1:123 <http://127.0.0.1:123>: read:
connection refused"
source="collector.go:123"2018-08-31T12:43:14.781749-03:00 polar
node_exporter[734]: time="2018-08-31T12:43:14-03:00" level=error
msg="ERROR: ntp collector failed after 0.000829s: couldn't get SNTP reply:
read udp 127.0.0.1:47369->127.0.0.1:123 <http://127.0.0.1:123>: read:
connection refused"
source="collector.go:123"2018-08-31T12:43:51.957015-03:00 polar
salt-minion[1421]: [WARNING ] The function "module.run" is using its
deprecated version and will expire in version
"Sodium".2018-08-31T12:43:51.995945-03:00 polar salt-minion[1421]: [ERROR
] Mine on polar.iq.ufrgs.br <http://polar.iq.ufrgs.br> for
cephdisks.list2018-08-31T12:43:51.996215-03:00 polar salt-minion[1421]:
[ERROR ] Module function osd.deploy threw an exception. Exception: Mine
on polar.iq.ufrgs.br <http://polar.iq.ufrgs.br> for
cephdisks.list2018-08-31T12:44:14.782448-03:00 polar node_exporter[734]:
time="2018-08-31T12:44:14-03:00" level=error msg="ERROR: ntp collector
failed after 0.000093s: couldn't get SNTP reply: read udp
127.0.0.1:52722->127.0.0.1:123 <http://127.0.0.1:123>: read: connection
refused" source="collector.go:123"2018-08-31T12:45:14.778780-03:00 polar
node_exporter[734]: time="2018-08-31T12:45:14-03:00" level=error
msg="ERROR: ntp collector failed after 0.000285s: couldn't get SNTP reply:
read udp 127.0.0.1:56612->127.0.0.1:123 <http://127.0.0.1:123>: read:
connection refused"
source="collector.go:123"2018-08-31T12:46:14.782255-03:00 polar
node_exporter[734]: time="2018-08-31T12:46:14-03:00" level=error
msg="ERROR: ntp collector failed after 0.002603s: couldn't get SNTP reply:
read udp 127.0.0.1:46274->127.0.0.1:123 <http://127.0.0.1:123>: read:
connection refused" source="collector.go:123"*
*#################*
And just to be on the safe side, I also did (again on the very same minion):
*#################*
*# systemctl status salt-minion.service● salt-minion.service - The Salt
Minion Loaded: loaded (/usr/lib/systemd/system/salt-minion.service;
enabled; vendor preset: disabled) Active: active (running) since Fri
2018-08-31 12:01:04 -03; 50min ago Main PID: 1421 (salt-minion) Tasks: 6
(limit: 629145) CGroup: /system.slice/salt-minion.service
├─1421 /usr/bin/python3 /usr/bin/salt-minion ├─1595
/usr/bin/python3 /usr/bin/salt-minion └─1647 /usr/bin/python3
/usr/bin/salt-minionago 31 12:00:59 polar systemd[1]: Starting The Salt
Minion...ago 31 12:01:04 polar systemd[1]: Started The Salt Minion.ago 31
12:11:04 polar salt-minion[1421]: [WARNING ] The function "module.run" is
using its deprecated version and will expire in version "Sodium".ago 31
12:23:30 polar salt-minion[1421]: [WARNING ] The function "module.run" is
using its deprecated version and will expire in version "Sodium".ago 31
12:37:51 polar salt-minion[1421]: [WARNING ] The function "module.run" is
using its deprecated version and will expire in version "Sodium".ago 31
12:40:23 polar salt-minion[1421]: [WARNING ] The function "module.run" is
using its deprecated version and will expire in version "Sodium".ago 31
12:40:44 polar salt-minion[1421]: [WARNING ] The function "module.run" is
using its deprecated version and will expire in version "Sodium".ago 31
12:43:51 polar salt-minion[1421]: [WARNING ] The function "module.run" is
using its deprecated version and will expire in version "Sodium".ago 31
12:43:51 polar salt-minion[1421]: [ERROR ] Mine on polar.iq.ufrgs.br
<http://polar.iq.ufrgs.br> for cephdisks.listago 31 12:43:51 polar
salt-minion[1421]: [ERROR ] Module function osd.deploy threw an
exception. Exception: Mine on polar.iq.ufrgs.br <http://polar.iq.ufrgs.br>
for cephdisks.list*
*#################*
So I mostly see warnings concerning the SNTP server (possibly due to the
bad connection and rainy weather disturbing our faulty internet) and about
deprecated versions of software. The real error I'm getting (but still not
understanding how to solve it) is:
*#################*
*2018-08-31T12:43:51.995945-03:00 polar salt-minion[1421]: [ERROR ] Mine
on polar.iq.ufrgs.br <http://polar.iq.ufrgs.br> for
cephdisks.list2018-08-31T12:43:51.996215-03:00 polar salt-minion[1421]:
[ERROR ] Module function osd.deploy threw an exception. Exception: Mine
on polar.iq.ufrgs.br <http://polar.iq.ufrgs.br> for cephdisks.list*
*#################*
Any ideas on how to proceed from here? :( I'm totally clueless. :(
Thanks a lot once again,
Jones
On Fri, Aug 31, 2018 at 4:00 AM Eugen Block <eblock@xxxxxx> wrote:
Hi,
I'm not sure if there's a misunderstanding. You need to track the logs
during the osd deployment step (stage.3), that is where it fails, and
this is where /var/log/messages could be useful. Since the deployment
failed you have no systemd-units (ceph-osd@<ID>.service) to log
anything.
Before running stage.3 again try something like
grep -C5 ceph-disk /var/log/messages (or messages-201808*.xz)
or
grep -C5 sda4 /var/log/messages (or messages-201808*.xz)
If that doesn't reveal anything run stage.3 again and watch the logs.
Regards,
Eugen
Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:
> Hi Eugen.
>
> Ok, edited the file /etc/salt/minion, uncommented the "log_level_logfile"
> line and set it to "debug" level.
>
> Turned off the computer, waited a few minutes so that the time frame
would
> stand out in the /var/log/messages file, and restarted the computer.
>
> Using vi I "greped out" (awful wording) the reboot section. From that, I
> also removed most of what it seemed totally unrelated to ceph, salt,
> minions, grafana, prometheus, whatever.
>
> I got the lines below. It does not seem to complain about anything that I
> can see. :(
>
> ################
> 2018-08-30T15:41:46.455383-03:00 torcello systemd[1]: systemd 234 running
> in system mode. (+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT
+UTMP
> +LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS
> +KMOD -IDN2 -IDN default-hierarchy=hybrid)
> 2018-08-30T15:41:46.456330-03:00 torcello systemd[1]: Detected
architecture
> x86-64.
> 2018-08-30T15:41:46.456350-03:00 torcello systemd[1]: nss-lookup.target:
> Dependency Before=nss-lookup.target dropped
> 2018-08-30T15:41:46.456357-03:00 torcello systemd[1]: Started Load Kernel
> Modules.
> 2018-08-30T15:41:46.456369-03:00 torcello systemd[1]: Starting Apply
Kernel
> Variables...
> 2018-08-30T15:41:46.457230-03:00 torcello systemd[1]: Started
Alertmanager
> for prometheus.
> 2018-08-30T15:41:46.457237-03:00 torcello systemd[1]: Started Monitoring
> system and time series database.
> 2018-08-30T15:41:46.457403-03:00 torcello systemd[1]: Starting NTP
> client/server...
>
>
>
>
>
>
> *2018-08-30T15:41:46.457425-03:00 torcello systemd[1]: Started Prometheus
> exporter for machine metrics.2018-08-30T15:41:46.457706-03:00 torcello
> prometheus[695]: level=info ts=2018-08-30T18:41:44.797896888Z
> caller=main.go:225 msg="Starting Prometheus" version="(version=2.1.0,
> branch=non-git, revision=non-git)"2018-08-30T15:41:46.457712-03:00
torcello
> prometheus[695]: level=info ts=2018-08-30T18:41:44.797969232Z
> caller=main.go:226 build_context="(go=go1.9.4, user=abuild@lamb69,
> date=20180513-03:46:03)"2018-08-30T15:41:46.457719-03:00 torcello
> prometheus[695]: level=info ts=2018-08-30T18:41:44.798008802Z
> caller=main.go:227 host_details="(Linux 4.12.14-lp150.12.4-default #1 SMP
> Tue May 22 05:17:22 UTC 2018 (66b2eda) x86_64 torcello
> (none))"2018-08-30T15:41:46.457726-03:00 torcello prometheus[695]:
> level=info ts=2018-08-30T18:41:44.798044088Z caller=main.go:228
> fd_limits="(soft=1024, hard=4096)"2018-08-30T15:41:46.457738-03:00
torcello
> prometheus[695]: level=info ts=2018-08-30T18:41:44.802067189Z
> caller=web.go:383 component=web msg="Start listening for connections"
> address=0.0.0.0:9090 <http://0.0.0.0:9090
>2018-08-30T15:41:46.457745-03:00
> torcello prometheus[695]: level=info ts=2018-08-30T18:41:44.802037354Z
> caller=main.go:499 msg="Starting TSDB ..."*
> 2018-08-30T15:41:46.458145-03:00 torcello smartd[809]: Monitoring 1
> ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
> 2018-08-30T15:41:46.458321-03:00 torcello systemd[1]: Started NTP
> client/server.
> *2018-08-30T15:41:50.387157-03:00 torcello ceph_exporter[690]: 2018/08/30
> 15:41:50 Starting ceph exporter on ":9128"*
> 2018-08-30T15:41:52.658272-03:00 torcello wicked[905]: lo up
> 2018-08-30T15:41:52.658738-03:00 torcello wicked[905]: eth0 up
> 2018-08-30T15:41:52.659989-03:00 torcello systemd[1]: Started wicked
> managed network interfaces.
> 2018-08-30T15:41:52.660514-03:00 torcello systemd[1]: Reached target
> Network.
> 2018-08-30T15:41:52.667938-03:00 torcello systemd[1]: Starting OpenSSH
> Daemon...
> 2018-08-30T15:41:52.668292-03:00 torcello systemd[1]: Reached target
> Network is Online.
>
>
>
>
> *2018-08-30T15:41:52.669132-03:00 torcello systemd[1]: Started Ceph
cluster
> monitor daemon.2018-08-30T15:41:52.669328-03:00 torcello systemd[1]:
> Reached target ceph target allowing to start/stop all ceph-mon@.service
> instances at once.2018-08-30T15:41:52.670346-03:00 torcello systemd[1]:
> Started Ceph cluster manager daemon.2018-08-30T15:41:52.670565-03:00
> torcello systemd[1]: Reached target ceph target allowing to start/stop
all
> ceph-mgr@.service instances at once.2018-08-30T15:41:52.670839-03:00
> torcello systemd[1]: Reached target ceph target allowing to start/stop
all
> ceph*@.service instances at once.*
> 2018-08-30T15:41:52.671246-03:00 torcello systemd[1]: Starting Login and
> scanning of iSCSI devices...
> *2018-08-30T15:41:52.672402-03:00 torcello systemd[1]: Starting Grafana
> instance...*
> 2018-08-30T15:41:52.678922-03:00 torcello systemd[1]: Started Backup of
> /etc/sysconfig.
> 2018-08-30T15:41:52.679109-03:00 torcello systemd[1]: Reached target
Timers.
> *2018-08-30T15:41:52.679630-03:00 torcello systemd[1]: Started The Salt
> API.*
> 2018-08-30T15:41:52.692944-03:00 torcello systemd[1]: Starting Postfix
Mail
> Transport Agent...
> *2018-08-30T15:41:52.694687-03:00 torcello systemd[1]: Started The Salt
> Master Server.*
> *2018-08-30T15:41:52.696821-03:00 torcello systemd[1]: Starting The Salt
> Minion...*
> 2018-08-30T15:41:52.772750-03:00 torcello sshd-gen-keys-start[1408]:
> Checking for missing server keys in /etc/ssh
> 2018-08-30T15:41:52.818695-03:00 torcello iscsiadm[1412]: iscsiadm: No
> records found
> 2018-08-30T15:41:52.819541-03:00 torcello systemd[1]: Started Login and
> scanning of iSCSI devices.
> 2018-08-30T15:41:52.820214-03:00 torcello systemd[1]: Reached target
Remote
> File Systems.
> 2018-08-30T15:41:52.821418-03:00 torcello systemd[1]: Starting Permit
User
> Sessions...
> 2018-08-30T15:41:53.045278-03:00 torcello systemd[1]: Started Permit User
> Sessions.
> 2018-08-30T15:41:53.048482-03:00 torcello systemd[1]: Starting Hold until
> boot process finishes up...
> 2018-08-30T15:41:53.054461-03:00 torcello echo[1415]: Starting mail
service
> (Postfix)
> 2018-08-30T15:41:53.447390-03:00 torcello sshd[1431]: Server listening on
> 0.0.0.0 port 22.
> 2018-08-30T15:41:53.447685-03:00 torcello sshd[1431]: Server listening on
> :: port 22.
> 2018-08-30T15:41:53.447907-03:00 torcello systemd[1]: Started OpenSSH
> Daemon.
>
>
>
>
>
>
>
>
>
>
>
>
>
> *2018-08-30T15:41:54.519192-03:00 torcello grafana-server[1413]:
> t=2018-08-30T15:41:54-0300 lvl=info msg="Starting Grafana" logger=server
> version=5.1.3 commit=NA
> compiled=2018-08-30T15:41:53-03002018-08-30T15:41:54.519664-03:00
torcello
> grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Config
> loaded from" logger=settings
> file=/usr/share/grafana/conf/defaults.ini2018-08-30T15:41:54.519979-03:00
> torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
> msg="Config loaded from" logger=settings
> file=/etc/grafana/grafana.ini2018-08-30T15:41:54.520257-03:00 torcello
> grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Config
> overridden from command line" logger=settings
> arg="default.paths.data=/var/lib/grafana"2018-08-30T15:41:54.520546-03:00
> torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
> msg="Config overridden from command line" logger=settings
> arg="default.paths.logs=/var/log/grafana"2018-08-30T15:41:54.520823-03:00
> torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
> msg="Config overridden from command line" logger=settings
>
arg="default.paths.plugins=/var/lib/grafana/plugins"2018-08-30T15:41:54.521085-03:00
> torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
> msg="Config overridden from command line" logger=settings
>
arg="default.paths.provisioning=/etc/grafana/provisioning"2018-08-30T15:41:54.521343-03:00
> torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
> msg="Path Home" logger=settings
> path=/usr/share/grafana2018-08-30T15:41:54.521593-03:00 torcello
> grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Path Data"
> logger=settings path=/var/lib/grafana2018-08-30T15:41:54.521843-03:00
> torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
> msg="Path Logs" logger=settings
> path=/var/log/grafana2018-08-30T15:41:54.522108-03:00 torcello
> grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Path
> Plugins" logger=settings
> path=/var/lib/grafana/plugins2018-08-30T15:41:54.522361-03:00 torcello
> grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Path
> Provisioning" logger=settings
> path=/etc/grafana/provisioning2018-08-30T15:41:54.522611-03:00 torcello
> grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="App mode
> production" logger=settings2018-08-30T15:41:54.522885-03:00 torcello
> grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Writing
PID
> file" logger=server path=/var/run/grafana/grafana-server.pid pid=1413*
>
>
>
>
>
>
> *2018-08-30T15:41:54.523148-03:00 torcello grafana-server[1413]:
> t=2018-08-30T15:41:54-0300 lvl=info msg="Initializing DB" logger=sqlstore
> dbtype=sqlite32018-08-30T15:41:54.523398-03:00 torcello
> grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Starting
DB
> migration" logger=migrator2018-08-30T15:41:54.804052-03:00 torcello
> grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Executing
> migration" logger=migrator id="copy data account to
> org"2018-08-30T15:41:54.804423-03:00 torcello grafana-server[1413]:
> t=2018-08-30T15:41:54-0300 lvl=info msg="Skipping migration condition not
> fulfilled" logger=migrator id="copy data account to
> org"2018-08-30T15:41:54.804724-03:00 torcello grafana-server[1413]:
> t=2018-08-30T15:41:54-0300 lvl=info msg="Executing migration"
> logger=migrator id="copy data account_user to
> org_user"2018-08-30T15:41:54.804985-03:00 torcello grafana-server[1413]:
> t=2018-08-30T15:41:54-0300 lvl=info msg="Skipping migration condition not
> fulfilled" logger=migrator id="copy data account_user to
> org_user"2018-08-30T15:41:54.838327-03:00 torcello grafana-server[1413]:
> t=2018-08-30T15:41:54-0300 lvl=info msg="Starting plugin search"
> logger=plugins*
> 2018-08-30T15:41:54.947408-03:00 torcello systemd[1]: Starting Locale
> Service...
> 2018-08-30T15:41:54.979069-03:00 torcello systemd[1]: Started Locale
> Service.
>
>
>
>
>
> *2018-08-30T15:41:55.023859-03:00 torcello grafana-server[1413]:
> t=2018-08-30T15:41:55-0300 lvl=info msg="Registering plugin"
logger=plugins
> name=Discrete2018-08-30T15:41:55.028462-03:00 torcello
> grafana-server[1413]: t=2018-08-30T15:41:55-0300 lvl=info
msg="Registering
> plugin" logger=plugins name=Monasca2018-08-30T15:41:55.065227-03:00
> torcello grafana-server[1413]: t=2018-08-30T15:41:55-0300 lvl=eror
> msg="can't read datasource provisioning files from directory"
> logger=provisioning.datasources
>
path=/etc/grafana/provisioning/datasources2018-08-30T15:41:55.065462-03:00
> torcello grafana-server[1413]: t=2018-08-30T15:41:55-0300 lvl=eror
> msg="can't read dashboard provisioning files from directory"
> logger=provisioning.dashboard
> path=/etc/grafana/provisioning/dashboards2018-08-30T15:41:55.065636-03:00
> torcello grafana-server[1413]: t=2018-08-30T15:41:55-0300 lvl=info
> msg="Initializing Alerting"
> logger=alerting.engine2018-08-30T15:41:55.065779-03:00 torcello
> grafana-server[1413]: t=2018-08-30T15:41:55-0300 lvl=info
msg="Initializing
> CleanUpService" logger=cleanup*
> 2018-08-30T15:41:55.274779-03:00 torcello systemd[1]: Started Grafana
> instance.
> 2
> *018-08-30T15:41:55.313056-03:00 torcello grafana-server[1413]:
> t=2018-08-30T15:41:55-0300 lvl=info msg="Initializing Stream
> Manager"2018-08-30T15:41:55.313251-03:00 torcello grafana-server[1413]:
> t=2018-08-30T15:41:55-0300 lvl=info msg="Initializing HTTP Server"
> logger=http.server address=0.0.0.0:3000 <http://0.0.0.0:3000>
protocol=http
> subUrl= socket=*
> 2018-08-30T15:41:58.304749-03:00 torcello systemd[1]: Started Command
> Scheduler.
> 2018-08-30T15:41:58.381694-03:00 torcello systemd[1]: Started The Salt
> Minion.
> 2018-08-30T15:41:58.386643-03:00 torcello cron[1611]: (CRON) INFO
> (RANDOM_DELAY will be scaled with factor 11% if used.)
> 2018-08-30T15:41:58.396087-03:00 torcello cron[1611]: (CRON) INFO
(running
> with inotify support)
> 2018-08-30T15:42:06.367096-03:00 torcello systemd[1]: Started Hold until
> boot process finishes up.
> 2018-08-30T15:42:06.369301-03:00 torcello systemd[1]: Started Getty on
tty1.
> 2018-08-30T15:42:11.535310-03:00 torcello systemd[1792]: Reached target
> Paths.
> 2018-08-30T15:42:11.536128-03:00 torcello systemd[1792]: Starting D-Bus
> User Message Bus Socket.
> 2018-08-30T15:42:11.536378-03:00 torcello systemd[1792]: Reached target
> Timers.
> 2018-08-30T15:42:11.598968-03:00 torcello systemd[1792]: Listening on
D-Bus
> User Message Bus Socket.
> 2018-08-30T15:42:11.599151-03:00 torcello systemd[1792]: Reached target
> Sockets.
> 2018-08-30T15:42:11.599277-03:00 torcello systemd[1792]: Reached target
> Basic System.
> 2018-08-30T15:42:11.599398-03:00 torcello systemd[1792]: Reached target
> Default.
> 2018-08-30T15:42:11.599514-03:00 torcello systemd[1792]: Startup finished
> in 145ms.
> 2018-08-30T15:42:11.599636-03:00 torcello systemd[1]: Started User
Manager
> for UID 464.
> 2018-08-30T15:42:12.471869-03:00 torcello systemd[1792]: Started D-Bus
User
> Message Bus.
> 2018-08-30T15:42:15.898853-03:00 torcello systemd[1]: Starting Disk
> Manager...
> 2018-08-30T15:42:15.974641-03:00 torcello systemd[1]: Started Disk
Manager.
> 2018-08-30T15:42:16.897412-03:00 torcello node_exporter[807]:
> time="2018-08-30T15:42:16-03:00" level=error msg="ERROR: ntp collector
> failed after 0.000087s: couldn't get SNTP reply: read udp 127.0.0.1:42089
->
> 127.0.0.1:123: read: connection refused" source="collector.go:123"
> 2018-08-30T15:42:17.589461-03:00 torcello chronyd[845]: Selected source
> 200.189.40.8
> 2018-08-30T15:43:16.899040-03:00 torcello node_exporter[807]:
> time="2018-08-30T15:43:16-03:00" level=error msg="ERROR: ntp collector
> failed after 0.000105s: couldn't get SNTP reply: read udp 127.0.0.1:59525
->
> 127.0.0.1:123: read: connection refused" source="collector.go:123"
> 2018-08-30T15:44:15.496595-03:00 torcello systemd[1792]: Stopped target
> Default.
> 2018-08-30T15:44:15.496824-03:00 torcello systemd[1792]: Stopping D-Bus
> User Message Bus...
> 2018-08-30T15:44:15.502438-03:00 torcello systemd[1792]: Stopped D-Bus
User
> Message Bus.
> 2018-08-30T15:44:15.502627-03:00 torcello systemd[1792]: Stopped target
> Basic System.
> 2018-08-30T15:44:15.502776-03:00 torcello systemd[1792]: Stopped target
> Paths.
> 2018-08-30T15:44:15.502923-03:00 torcello systemd[1792]: Stopped target
> Timers.
> 2018-08-30T15:44:15.503062-03:00 torcello systemd[1792]: Stopped target
> Sockets.
> 2018-08-30T15:44:15.503200-03:00 torcello systemd[1792]: Closed D-Bus
User
> Message Bus Socket.
> 2018-08-30T15:44:15.503356-03:00 torcello systemd[1792]: Reached target
> Shutdown.
> 2018-08-30T15:44:15.503572-03:00 torcello systemd[1792]: Starting Exit
the
> Session...
> 2018-08-30T15:44:15.511298-03:00 torcello systemd[2295]: Starting D-Bus
> User Message Bus Socket.
> 2018-08-30T15:44:15.511493-03:00 torcello systemd[2295]: Reached target
> Timers.
> 2018-08-30T15:44:15.511664-03:00 torcello systemd[2295]: Reached target
> Paths.
> 2018-08-30T15:44:15.517873-03:00 torcello systemd[2295]: Listening on
D-Bus
> User Message Bus Socket.
> 2018-08-30T15:44:15.518060-03:00 torcello systemd[2295]: Reached target
> Sockets.
> 2018-08-30T15:44:15.518216-03:00 torcello systemd[2295]: Reached target
> Basic System.
> 2018-08-30T15:44:15.518373-03:00 torcello systemd[2295]: Reached target
> Default.
> 2018-08-30T15:44:15.518501-03:00 torcello systemd[2295]: Startup finished
> in 31ms.
> 2018-08-30T15:44:15.518634-03:00 torcello systemd[1]: Started User
Manager
> for UID 1000.
> 2018-08-30T15:44:15.518759-03:00 torcello systemd[1792]: Received
> SIGRTMIN+24 from PID 2300 (kill).
> 2018-08-30T15:44:15.537634-03:00 torcello systemd[1]: Stopped User
Manager
> for UID 464.
> 2018-08-30T15:44:15.538422-03:00 torcello systemd[1]: Removed slice User
> Slice of sddm.
> 2018-08-30T15:44:15.613246-03:00 torcello systemd[2295]: Started D-Bus
User
> Message Bus.
> 2018-08-30T15:44:15.623989-03:00 torcello dbus-daemon[2311]: [session
> uid=1000 pid=2311] Successfully activated service
'org.freedesktop.systemd1'
> 2018-08-30T15:44:16.447162-03:00 torcello kapplymousetheme[2350]:
> kcm_input: Using X11 backend
> 2018-08-30T15:44:16.901642-03:00 torcello node_exporter[807]:
> time="2018-08-30T15:44:16-03:00" level=error msg="ERROR: ntp collector
> failed after 0.000205s: couldn't get SNTP reply: read udp 127.0.0.1:53434
->
> 127.0.0.1:123: read: connection refused" source="collector.go:123"
> ################
>
> Any ideas?
>
> Thanks a lot,
>
> Jones
>
> On Thu, Aug 30, 2018 at 4:14 AM Eugen Block <eblock@xxxxxx> wrote:
>
>> Hi,
>>
>> > So, it only contains logs concerning the node itself (is it correct?
>> sincer
>> > node01 is also the master, I was expecting it to have logs from the
other
>> > too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I
have
>> > available, and nothing "shines out" (sorry for my poor english) as a
>> > possible error.
>>
>> the logging is not configured to be centralised per default, you would
>> have to configure that yourself.
>>
>> Regarding the OSDs, if there are OSD logs created, they're created on
>> the OSD nodes, not on the master. But since the OSD deployment fails,
>> there probably are no OSD specific logs yet. So you'll have to take a
>> look into the syslog (/var/log/messages), that's where the salt-minion
>> reports its attempts to create the OSDs. Chances are high that you'll
>> find the root cause in here.
>>
>> If the output is not enough, set the log-level to debug:
>>
>> osd-1:~ # grep -E "^log_level" /etc/salt/minion
>> log_level: debug
>>
>>
>> Regards,
>> Eugen
>>
>>
>> Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:
>>
>> > Hi Eugen.
>> >
>> > Sorry for the delay in answering.
>> >
>> > Just looked in the /var/log/ceph/ directory. It only contains the
>> following
>> > files (for example on node01):
>> >
>> > #######
>> > # ls -lart
>> > total 3864
>> > -rw------- 1 ceph ceph 904 ago 24 13:11 ceph.audit.log-20180829.xz
>> > drwxr-xr-x 1 root root 898 ago 28 10:07 ..
>> > -rw-r--r-- 1 ceph ceph 189464 ago 28 23:59
>> ceph-mon.node01.log-20180829.xz
>> > -rw------- 1 ceph ceph 24360 ago 28 23:59 ceph.log-20180829.xz
>> > -rw-r--r-- 1 ceph ceph 48584 ago 29 00:00
>> ceph-mgr.node01.log-20180829.xz
>> > -rw------- 1 ceph ceph 0 ago 29 00:00 ceph.audit.log
>> > drwxrws--T 1 ceph ceph 352 ago 29 00:00 .
>> > -rw-r--r-- 1 ceph ceph 1908122 ago 29 12:46 ceph-mon.node01.log
>> > -rw------- 1 ceph ceph 175229 ago 29 12:48 ceph.log
>> > -rw-r--r-- 1 ceph ceph 1599920 ago 29 12:49 ceph-mgr.node01.log
>> > #######
>> >
>> > So, it only contains logs concerning the node itself (is it correct?
>> sincer
>> > node01 is also the master, I was expecting it to have logs from the
other
>> > too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I
have
>> > available, and nothing "shines out" (sorry for my poor english) as a
>> > possible error.
>> >
>> > Any suggestion on how to proceed?
>> >
>> > Thanks a lot in advance,
>> >
>> > Jones
>> >
>> >
>> > On Mon, Aug 27, 2018 at 5:29 AM Eugen Block <eblock@xxxxxx> wrote:
>> >
>> >> Hi Jones,
>> >>
>> >> all ceph logs are in the directory /var/log/ceph/, each daemon has
its
>> >> own log file, e.g. OSD logs are named ceph-osd.*.
>> >>
>> >> I haven't tried it but I don't think SUSE Enterprise Storage deploys
>> >> OSDs on partitioned disks. Is there a way to attach a second disk to
>> >> the OSD nodes, maybe via USB or something?
>> >>
>> >> Although this thread is ceph related it is referring to a specific
>> >> product, so I would recommend to post your question in the SUSE forum
>> >> [1].
>> >>
>> >> Regards,
>> >> Eugen
>> >>
>> >> [1]
https://forums.suse.com/forumdisplay.php?99-SUSE-Enterprise-Storage
>> >>
>> >> Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:
>> >>
>> >> > Hi Eugen.
>> >> >
>> >> > Thanks for the suggestion. I'll look for the logs (since it's our
>> first
>> >> > attempt with ceph, I'll have to discover where they are, but no
>> problem).
>> >> >
>> >> > One thing called my attention on your response however:
>> >> >
>> >> > I haven't made myself clear, but one of the failures we encountered
>> were
>> >> > that the files now containing:
>> >> >
>> >> > node02:
>> >> > ----------
>> >> > storage:
>> >> > ----------
>> >> > osds:
>> >> > ----------
>> >> > /dev/sda4:
>> >> > ----------
>> >> > format:
>> >> > bluestore
>> >> > standalone:
>> >> > True
>> >> >
>> >> > Were originally empty, and we filled them by hand following a model
>> found
>> >> > elsewhere on the web. It was necessary, so that we could continue,
but
>> >> the
>> >> > model indicated that, for example, it should have the path for
>> /dev/sda
>> >> > here, not /dev/sda4. We chosen to include the specific partition
>> >> > identification because we won't have dedicated disks here, rather
just
>> >> the
>> >> > very same partition as all disks were partitioned exactly the same.
>> >> >
>> >> > While that was enough for the procedure to continue at that point,
>> now I
>> >> > wonder if it was the right call and, if it indeed was, if it was
done
>> >> > properly. As such, I wonder: what you mean by "wipe" the partition
>> here?
>> >> > /dev/sda4 is created, but is both empty and unmounted: Should a
>> different
>> >> > operation be performed on it, should I remove it first, should I
have
>> >> > written the files above with only /dev/sda as target?
>> >> >
>> >> > I know that probably I wouldn't run in this issues with dedicated
>> discks,
>> >> > but unfortunately that is absolutely not an option.
>> >> >
>> >> > Thanks a lot in advance for any comments and/or extra suggestions.
>> >> >
>> >> > Sincerely yours,
>> >> >
>> >> > Jones
>> >> >
>> >> > On Sat, Aug 25, 2018 at 5:46 PM Eugen Block <eblock@xxxxxx> wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> take a look into the logs, they should point you in the right
>> direction.
>> >> >> Since the deployment stage fails at the OSD level, start with the
OSD
>> >> >> logs. Something's not right with the disks/partitions, did you
wipe
>> >> >> the partition from previous attempts?
>> >> >>
>> >> >> Regards,
>> >> >> Eugen
>> >> >>
>> >> >> Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:
>> >> >>
>> >> >>> (Please forgive my previous email: I was using another message
and
>> >> >>> completely forget to update the subject)
>> >> >>>
>> >> >>> Hi all.
>> >> >>>
>> >> >>> I'm new to ceph, and after having serious problems in ceph stages
>> 0, 1
>> >> >> and
>> >> >>> 2 that I could solve myself, now it seems that I have hit a wall
>> harder
>> >> >>> than my head. :)
>> >> >>>
>> >> >>> When I run salt-run state.orch ceph.stage.deploy, i monitor I
see it
>> >> >> going
>> >> >>> up to here:
>> >> >>>
>> >> >>> #######
>> >> >>> [14/71] ceph.sysctl on
>> >> >>> node01....................................... ✓ (0.5s)
>> >> >>> node02........................................ ✓ (0.7s)
>> >> >>> node03....................................... ✓ (0.6s)
>> >> >>> node04......................................... ✓
(0.5s)
>> >> >>> node05....................................... ✓ (0.6s)
>> >> >>> node06.......................................... ✓
(0.5s)
>> >> >>>
>> >> >>> [15/71] ceph.osd on
>> >> >>> node01...................................... ❌ (0.7s)
>> >> >>> node02........................................ ❌ (0.7s)
>> >> >>> node03....................................... ❌ (0.7s)
>> >> >>> node04......................................... ❌
(0.6s)
>> >> >>> node05....................................... ❌ (0.6s)
>> >> >>> node06.......................................... ❌
(0.7s)
>> >> >>>
>> >> >>> Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71
>> time=624.7s
>> >> >>>
>> >> >>> Failures summary:
>> >> >>>
>> >> >>> ceph.osd (/srv/salt/ceph/osd):
>> >> >>> node02:
>> >> >>> deploy OSDs: Module function osd.deploy threw an exception.
>> >> >> Exception:
>> >> >>> Mine on node02 for cephdisks.list
>> >> >>> node03:
>> >> >>> deploy OSDs: Module function osd.deploy threw an exception.
>> >> >> Exception:
>> >> >>> Mine on node03 for cephdisks.list
>> >> >>> node01:
>> >> >>> deploy OSDs: Module function osd.deploy threw an exception.
>> >> >> Exception:
>> >> >>> Mine on node01 for cephdisks.list
>> >> >>> node04:
>> >> >>> deploy OSDs: Module function osd.deploy threw an exception.
>> >> >> Exception:
>> >> >>> Mine on node04 for cephdisks.list
>> >> >>> node05:
>> >> >>> deploy OSDs: Module function osd.deploy threw an exception.
>> >> >> Exception:
>> >> >>> Mine on node05 for cephdisks.list
>> >> >>> node06:
>> >> >>> deploy OSDs: Module function osd.deploy threw an exception.
>> >> >> Exception:
>> >> >>> Mine on node06 for cephdisks.list
>> >> >>> #######
>> >> >>>
>> >> >>> Since this is a first attempt in 6 simple test machines, we are
>> going
>> >> to
>> >> >>> put the mon, osds, etc, in all nodes at first. Only the master is
>> left
>> >> >> in a
>> >> >>> single machine (node01) by now.
>> >> >>>
>> >> >>> As they are simple machines, they have a single hdd, which is
>> >> partitioned
>> >> >>> as follows (the hda4 partition is unmounted and left for the ceph
>> >> >> system):
>> >> >>>
>> >> >>> ###########
>> >> >>> # lsblk
>> >> >>> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
>> >> >>> sda 8:0 0 465,8G 0 disk
>> >> >>> ├─sda1 8:1 0 500M 0 part /boot/efi
>> >> >>> ├─sda2 8:2 0 16G 0 part [SWAP]
>> >> >>> ├─sda3 8:3 0 49,3G 0 part /
>> >> >>> └─sda4 8:4 0 400G 0 part
>> >> >>> sr0 11:0 1 3,7G 0 rom
>> >> >>>
>> >> >>> # salt -I 'roles:storage' cephdisks.list
>> >> >>> node01:
>> >> >>> node02:
>> >> >>> node03:
>> >> >>> node04:
>> >> >>> node05:
>> >> >>> node06:
>> >> >>>
>> >> >>> # salt -I 'roles:storage' pillar.get ceph
>> >> >>> node02:
>> >> >>> ----------
>> >> >>> storage:
>> >> >>> ----------
>> >> >>> osds:
>> >> >>> ----------
>> >> >>> /dev/sda4:
>> >> >>> ----------
>> >> >>> format:
>> >> >>> bluestore
>> >> >>> standalone:
>> >> >>> True
>> >> >>> (and so on for all 6 machines)
>> >> >>> ##########
>> >> >>>
>> >> >>> Finally and just in case, my policy.cfg file reads:
>> >> >>>
>> >> >>> #########
>> >> >>> #cluster-unassigned/cluster/*.sls
>> >> >>> cluster-ceph/cluster/*.sls
>> >> >>> profile-default/cluster/*.sls
>> >> >>> profile-default/stack/default/ceph/minions/*yml
>> >> >>> config/stack/default/global.yml
>> >> >>> config/stack/default/ceph/cluster.yml
>> >> >>> role-master/cluster/node01.sls
>> >> >>> role-admin/cluster/*.sls
>> >> >>> role-mon/cluster/*.sls
>> >> >>> role-mgr/cluster/*.sls
>> >> >>> role-mds/cluster/*.sls
>> >> >>> role-ganesha/cluster/*.sls
>> >> >>> role-client-nfs/cluster/*.sls
>> >> >>> role-client-cephfs/cluster/*.sls
>> >> >>> ##########
>> >> >>>
>> >> >>> Please, could someone help me and shed some light on this issue?
>> >> >>>
>> >> >>> Thanks a lot in advance,
>> >> >>>
>> >> >>> Regasrds,
>> >> >>>
>> >> >>> Jones
>> >> >>
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> ceph-users mailing list
>> >> >> ceph-users@xxxxxxxxxxxxxx
>> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>
>> >>
>> >>
>> >>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>