Re: Ceph-Deploy error on 15/71 stage

Eugen Block <eblock@xxxxxx> · Fri, 31 Aug 2018 07:00:12 +0000

Hi,

I'm not sure if there's a misunderstanding. You need to track the logs  
during the osd deployment step (stage.3), that is where it fails, and  
this is where /var/log/messages could be useful. Since the deployment  
failed you have no systemd-units (ceph-osd@<ID>.service) to log  
anything.

Before running stage.3 again try something like

grep -C5 ceph-disk /var/log/messages (or messages-201808*.xz)

or

grep -C5 sda4 /var/log/messages (or messages-201808*.xz)

If that doesn't reveal anything run stage.3 again and watch the logs.

Regards,
Eugen

Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:

Hi Eugen.

Ok, edited the file /etc/salt/minion, uncommented the "log_level_logfile"
line and set it to "debug" level.

Turned off the computer, waited a few minutes so that the time frame would
stand out in the /var/log/messages file, and restarted the computer.

Using vi I "greped out" (awful wording) the reboot section. From that, I
also removed most of what it seemed totally unrelated to ceph, salt,
minions, grafana, prometheus, whatever.

I got the lines below. It does not seem to complain about anything that I
can see. :(

################
2018-08-30T15:41:46.455383-03:00 torcello systemd[1]: systemd 234 running
in system mode. (+PAM -AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINIT +UTMP
+LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS
+KMOD -IDN2 -IDN default-hierarchy=hybrid)
2018-08-30T15:41:46.456330-03:00 torcello systemd[1]: Detected architecture
x86-64.
2018-08-30T15:41:46.456350-03:00 torcello systemd[1]: nss-lookup.target:
Dependency Before=nss-lookup.target dropped
2018-08-30T15:41:46.456357-03:00 torcello systemd[1]: Started Load Kernel
Modules.
2018-08-30T15:41:46.456369-03:00 torcello systemd[1]: Starting Apply Kernel
Variables...
2018-08-30T15:41:46.457230-03:00 torcello systemd[1]: Started Alertmanager
for prometheus.
2018-08-30T15:41:46.457237-03:00 torcello systemd[1]: Started Monitoring
system and time series database.
2018-08-30T15:41:46.457403-03:00 torcello systemd[1]: Starting NTP
client/server...

*2018-08-30T15:41:46.457425-03:00 torcello systemd[1]: Started Prometheus
exporter for machine metrics.2018-08-30T15:41:46.457706-03:00 torcello
prometheus[695]: level=info ts=2018-08-30T18:41:44.797896888Z
caller=main.go:225 msg="Starting Prometheus" version="(version=2.1.0,
branch=non-git, revision=non-git)"2018-08-30T15:41:46.457712-03:00 torcello
prometheus[695]: level=info ts=2018-08-30T18:41:44.797969232Z
caller=main.go:226 build_context="(go=go1.9.4, user=abuild@lamb69,
date=20180513-03:46:03)"2018-08-30T15:41:46.457719-03:00 torcello
prometheus[695]: level=info ts=2018-08-30T18:41:44.798008802Z
caller=main.go:227 host_details="(Linux 4.12.14-lp150.12.4-default #1 SMP
Tue May 22 05:17:22 UTC 2018 (66b2eda) x86_64 torcello
(none))"2018-08-30T15:41:46.457726-03:00 torcello prometheus[695]:
level=info ts=2018-08-30T18:41:44.798044088Z caller=main.go:228
fd_limits="(soft=1024, hard=4096)"2018-08-30T15:41:46.457738-03:00 torcello
prometheus[695]: level=info ts=2018-08-30T18:41:44.802067189Z
caller=web.go:383 component=web msg="Start listening for connections"
address=0.0.0.0:9090 <http://0.0.0.0:9090>2018-08-30T15:41:46.457745-03:00
torcello prometheus[695]: level=info ts=2018-08-30T18:41:44.802037354Z
caller=main.go:499 msg="Starting TSDB ..."*
2018-08-30T15:41:46.458145-03:00 torcello smartd[809]: Monitoring 1
ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
2018-08-30T15:41:46.458321-03:00 torcello systemd[1]: Started NTP
client/server.
*2018-08-30T15:41:50.387157-03:00 torcello ceph_exporter[690]: 2018/08/30
15:41:50 Starting ceph exporter on ":9128"*
2018-08-30T15:41:52.658272-03:00 torcello wicked[905]: lo              up
2018-08-30T15:41:52.658738-03:00 torcello wicked[905]: eth0            up
2018-08-30T15:41:52.659989-03:00 torcello systemd[1]: Started wicked
managed network interfaces.
2018-08-30T15:41:52.660514-03:00 torcello systemd[1]: Reached target
Network.
2018-08-30T15:41:52.667938-03:00 torcello systemd[1]: Starting OpenSSH
Daemon...
2018-08-30T15:41:52.668292-03:00 torcello systemd[1]: Reached target
Network is Online.

*2018-08-30T15:41:52.669132-03:00 torcello systemd[1]: Started Ceph cluster
monitor daemon.2018-08-30T15:41:52.669328-03:00 torcello systemd[1]:
Reached target ceph target allowing to start/stop all ceph-mon@.service
instances at once.2018-08-30T15:41:52.670346-03:00 torcello systemd[1]:
Started Ceph cluster manager daemon.2018-08-30T15:41:52.670565-03:00
torcello systemd[1]: Reached target ceph target allowing to start/stop all
ceph-mgr@.service instances at once.2018-08-30T15:41:52.670839-03:00
torcello systemd[1]: Reached target ceph target allowing to start/stop all
ceph*@.service instances at once.*
2018-08-30T15:41:52.671246-03:00 torcello systemd[1]: Starting Login and
scanning of iSCSI devices...
*2018-08-30T15:41:52.672402-03:00 torcello systemd[1]: Starting Grafana
instance...*
2018-08-30T15:41:52.678922-03:00 torcello systemd[1]: Started Backup of
/etc/sysconfig.
2018-08-30T15:41:52.679109-03:00 torcello systemd[1]: Reached target Timers.
*2018-08-30T15:41:52.679630-03:00 torcello systemd[1]: Started The Salt
API.*
2018-08-30T15:41:52.692944-03:00 torcello systemd[1]: Starting Postfix Mail
Transport Agent...
*2018-08-30T15:41:52.694687-03:00 torcello systemd[1]: Started The Salt
Master Server.*
*2018-08-30T15:41:52.696821-03:00 torcello systemd[1]: Starting The Salt
Minion...*
2018-08-30T15:41:52.772750-03:00 torcello sshd-gen-keys-start[1408]:
Checking for missing server keys in /etc/ssh
2018-08-30T15:41:52.818695-03:00 torcello iscsiadm[1412]: iscsiadm: No
records found
2018-08-30T15:41:52.819541-03:00 torcello systemd[1]: Started Login and
scanning of iSCSI devices.
2018-08-30T15:41:52.820214-03:00 torcello systemd[1]: Reached target Remote
File Systems.
2018-08-30T15:41:52.821418-03:00 torcello systemd[1]: Starting Permit User
Sessions...
2018-08-30T15:41:53.045278-03:00 torcello systemd[1]: Started Permit User
Sessions.
2018-08-30T15:41:53.048482-03:00 torcello systemd[1]: Starting Hold until
boot process finishes up...
2018-08-30T15:41:53.054461-03:00 torcello echo[1415]: Starting mail service
(Postfix)
2018-08-30T15:41:53.447390-03:00 torcello sshd[1431]: Server listening on
0.0.0.0 port 22.
2018-08-30T15:41:53.447685-03:00 torcello sshd[1431]: Server listening on
:: port 22.
2018-08-30T15:41:53.447907-03:00 torcello systemd[1]: Started OpenSSH
Daemon.

*2018-08-30T15:41:54.519192-03:00 torcello grafana-server[1413]:
t=2018-08-30T15:41:54-0300 lvl=info msg="Starting Grafana" logger=server
version=5.1.3 commit=NA
compiled=2018-08-30T15:41:53-03002018-08-30T15:41:54.519664-03:00 torcello
grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Config
loaded from" logger=settings
file=/usr/share/grafana/conf/defaults.ini2018-08-30T15:41:54.519979-03:00
torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
msg="Config loaded from" logger=settings
file=/etc/grafana/grafana.ini2018-08-30T15:41:54.520257-03:00 torcello
grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Config
overridden from command line" logger=settings
arg="default.paths.data=/var/lib/grafana"2018-08-30T15:41:54.520546-03:00
torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
msg="Config overridden from command line" logger=settings
arg="default.paths.logs=/var/log/grafana"2018-08-30T15:41:54.520823-03:00
torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
msg="Config overridden from command line" logger=settings
arg="default.paths.plugins=/var/lib/grafana/plugins"2018-08-30T15:41:54.521085-03:00
torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
msg="Config overridden from command line" logger=settings
arg="default.paths.provisioning=/etc/grafana/provisioning"2018-08-30T15:41:54.521343-03:00
torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
msg="Path Home" logger=settings
path=/usr/share/grafana2018-08-30T15:41:54.521593-03:00 torcello
grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Path Data"
logger=settings path=/var/lib/grafana2018-08-30T15:41:54.521843-03:00
torcello grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info
msg="Path Logs" logger=settings
path=/var/log/grafana2018-08-30T15:41:54.522108-03:00 torcello
grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Path
Plugins" logger=settings
path=/var/lib/grafana/plugins2018-08-30T15:41:54.522361-03:00 torcello
grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Path
Provisioning" logger=settings
path=/etc/grafana/provisioning2018-08-30T15:41:54.522611-03:00 torcello
grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="App mode
production" logger=settings2018-08-30T15:41:54.522885-03:00 torcello
grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Writing PID
file" logger=server path=/var/run/grafana/grafana-server.pid pid=1413*

*2018-08-30T15:41:54.523148-03:00 torcello grafana-server[1413]:
t=2018-08-30T15:41:54-0300 lvl=info msg="Initializing DB" logger=sqlstore
dbtype=sqlite32018-08-30T15:41:54.523398-03:00 torcello
grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Starting DB
migration" logger=migrator2018-08-30T15:41:54.804052-03:00 torcello
grafana-server[1413]: t=2018-08-30T15:41:54-0300 lvl=info msg="Executing
migration" logger=migrator id="copy data account to
org"2018-08-30T15:41:54.804423-03:00 torcello grafana-server[1413]:
t=2018-08-30T15:41:54-0300 lvl=info msg="Skipping migration condition not
fulfilled" logger=migrator id="copy data account to
org"2018-08-30T15:41:54.804724-03:00 torcello grafana-server[1413]:
t=2018-08-30T15:41:54-0300 lvl=info msg="Executing migration"
logger=migrator id="copy data account_user to
org_user"2018-08-30T15:41:54.804985-03:00 torcello grafana-server[1413]:
t=2018-08-30T15:41:54-0300 lvl=info msg="Skipping migration condition not
fulfilled" logger=migrator id="copy data account_user to
org_user"2018-08-30T15:41:54.838327-03:00 torcello grafana-server[1413]:
t=2018-08-30T15:41:54-0300 lvl=info msg="Starting plugin search"
logger=plugins*
2018-08-30T15:41:54.947408-03:00 torcello systemd[1]: Starting Locale
Service...
2018-08-30T15:41:54.979069-03:00 torcello systemd[1]: Started Locale
Service.

*2018-08-30T15:41:55.023859-03:00 torcello grafana-server[1413]:
t=2018-08-30T15:41:55-0300 lvl=info msg="Registering plugin" logger=plugins
name=Discrete2018-08-30T15:41:55.028462-03:00 torcello
grafana-server[1413]: t=2018-08-30T15:41:55-0300 lvl=info msg="Registering
plugin" logger=plugins name=Monasca2018-08-30T15:41:55.065227-03:00
torcello grafana-server[1413]: t=2018-08-30T15:41:55-0300 lvl=eror
msg="can't read datasource provisioning files from directory"
logger=provisioning.datasources
path=/etc/grafana/provisioning/datasources2018-08-30T15:41:55.065462-03:00
torcello grafana-server[1413]: t=2018-08-30T15:41:55-0300 lvl=eror
msg="can't read dashboard provisioning files from directory"
logger=provisioning.dashboard
path=/etc/grafana/provisioning/dashboards2018-08-30T15:41:55.065636-03:00
torcello grafana-server[1413]: t=2018-08-30T15:41:55-0300 lvl=info
msg="Initializing Alerting"
logger=alerting.engine2018-08-30T15:41:55.065779-03:00 torcello
grafana-server[1413]: t=2018-08-30T15:41:55-0300 lvl=info msg="Initializing
CleanUpService" logger=cleanup*
2018-08-30T15:41:55.274779-03:00 torcello systemd[1]: Started Grafana
instance.
2
*018-08-30T15:41:55.313056-03:00 torcello grafana-server[1413]:
t=2018-08-30T15:41:55-0300 lvl=info msg="Initializing Stream
Manager"2018-08-30T15:41:55.313251-03:00 torcello grafana-server[1413]:
t=2018-08-30T15:41:55-0300 lvl=info msg="Initializing HTTP Server"
logger=http.server address=0.0.0.0:3000 <http://0.0.0.0:3000> protocol=http
subUrl= socket=*
2018-08-30T15:41:58.304749-03:00 torcello systemd[1]: Started Command
Scheduler.
2018-08-30T15:41:58.381694-03:00 torcello systemd[1]: Started The Salt
Minion.
2018-08-30T15:41:58.386643-03:00 torcello cron[1611]: (CRON) INFO
(RANDOM_DELAY will be scaled with factor 11% if used.)
2018-08-30T15:41:58.396087-03:00 torcello cron[1611]: (CRON) INFO (running
with inotify support)
2018-08-30T15:42:06.367096-03:00 torcello systemd[1]: Started Hold until
boot process finishes up.
2018-08-30T15:42:06.369301-03:00 torcello systemd[1]: Started Getty on tty1.
2018-08-30T15:42:11.535310-03:00 torcello systemd[1792]: Reached target
Paths.
2018-08-30T15:42:11.536128-03:00 torcello systemd[1792]: Starting D-Bus
User Message Bus Socket.
2018-08-30T15:42:11.536378-03:00 torcello systemd[1792]: Reached target
Timers.
2018-08-30T15:42:11.598968-03:00 torcello systemd[1792]: Listening on D-Bus
User Message Bus Socket.
2018-08-30T15:42:11.599151-03:00 torcello systemd[1792]: Reached target
Sockets.
2018-08-30T15:42:11.599277-03:00 torcello systemd[1792]: Reached target
Basic System.
2018-08-30T15:42:11.599398-03:00 torcello systemd[1792]: Reached target
Default.
2018-08-30T15:42:11.599514-03:00 torcello systemd[1792]: Startup finished
in 145ms.
2018-08-30T15:42:11.599636-03:00 torcello systemd[1]: Started User Manager
for UID 464.
2018-08-30T15:42:12.471869-03:00 torcello systemd[1792]: Started D-Bus User
Message Bus.
2018-08-30T15:42:15.898853-03:00 torcello systemd[1]: Starting Disk
Manager...
2018-08-30T15:42:15.974641-03:00 torcello systemd[1]: Started Disk Manager.
2018-08-30T15:42:16.897412-03:00 torcello node_exporter[807]:
time="2018-08-30T15:42:16-03:00" level=error msg="ERROR: ntp collector
failed after 0.000087s: couldn't get SNTP reply: read udp 127.0.0.1:42089->
127.0.0.1:123: read: connection refused" source="collector.go:123"
2018-08-30T15:42:17.589461-03:00 torcello chronyd[845]: Selected source
200.189.40.8
2018-08-30T15:43:16.899040-03:00 torcello node_exporter[807]:
time="2018-08-30T15:43:16-03:00" level=error msg="ERROR: ntp collector
failed after 0.000105s: couldn't get SNTP reply: read udp 127.0.0.1:59525->
127.0.0.1:123: read: connection refused" source="collector.go:123"
2018-08-30T15:44:15.496595-03:00 torcello systemd[1792]: Stopped target
Default.
2018-08-30T15:44:15.496824-03:00 torcello systemd[1792]: Stopping D-Bus
User Message Bus...
2018-08-30T15:44:15.502438-03:00 torcello systemd[1792]: Stopped D-Bus User
Message Bus.
2018-08-30T15:44:15.502627-03:00 torcello systemd[1792]: Stopped target
Basic System.
2018-08-30T15:44:15.502776-03:00 torcello systemd[1792]: Stopped target
Paths.
2018-08-30T15:44:15.502923-03:00 torcello systemd[1792]: Stopped target
Timers.
2018-08-30T15:44:15.503062-03:00 torcello systemd[1792]: Stopped target
Sockets.
2018-08-30T15:44:15.503200-03:00 torcello systemd[1792]: Closed D-Bus User
Message Bus Socket.
2018-08-30T15:44:15.503356-03:00 torcello systemd[1792]: Reached target
Shutdown.
2018-08-30T15:44:15.503572-03:00 torcello systemd[1792]: Starting Exit the
Session...
2018-08-30T15:44:15.511298-03:00 torcello systemd[2295]: Starting D-Bus
User Message Bus Socket.
2018-08-30T15:44:15.511493-03:00 torcello systemd[2295]: Reached target
Timers.
2018-08-30T15:44:15.511664-03:00 torcello systemd[2295]: Reached target
Paths.
2018-08-30T15:44:15.517873-03:00 torcello systemd[2295]: Listening on D-Bus
User Message Bus Socket.
2018-08-30T15:44:15.518060-03:00 torcello systemd[2295]: Reached target
Sockets.
2018-08-30T15:44:15.518216-03:00 torcello systemd[2295]: Reached target
Basic System.
2018-08-30T15:44:15.518373-03:00 torcello systemd[2295]: Reached target
Default.
2018-08-30T15:44:15.518501-03:00 torcello systemd[2295]: Startup finished
in 31ms.
2018-08-30T15:44:15.518634-03:00 torcello systemd[1]: Started User Manager
for UID 1000.
2018-08-30T15:44:15.518759-03:00 torcello systemd[1792]: Received
SIGRTMIN+24 from PID 2300 (kill).
2018-08-30T15:44:15.537634-03:00 torcello systemd[1]: Stopped User Manager
for UID 464.
2018-08-30T15:44:15.538422-03:00 torcello systemd[1]: Removed slice User
Slice of sddm.
2018-08-30T15:44:15.613246-03:00 torcello systemd[2295]: Started D-Bus User
Message Bus.
2018-08-30T15:44:15.623989-03:00 torcello dbus-daemon[2311]: [session
uid=1000 pid=2311] Successfully activated service 'org.freedesktop.systemd1'
2018-08-30T15:44:16.447162-03:00 torcello kapplymousetheme[2350]:
kcm_input: Using X11 backend
2018-08-30T15:44:16.901642-03:00 torcello node_exporter[807]:
time="2018-08-30T15:44:16-03:00" level=error msg="ERROR: ntp collector
failed after 0.000205s: couldn't get SNTP reply: read udp 127.0.0.1:53434->
127.0.0.1:123: read: connection refused" source="collector.go:123"
################

Any ideas?

Thanks a lot,

Jones

On Thu, Aug 30, 2018 at 4:14 AM Eugen Block <eblock@xxxxxx> wrote:

Hi,

> So, it only contains logs concerning the node itself (is it correct?
sincer
> node01 is also the master, I was expecting it to have logs from the other
> too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have
> available, and nothing "shines out" (sorry for my poor english) as a
> possible error.

the logging is not configured to be centralised per default, you would
have to configure that yourself.

Regarding the OSDs, if there are OSD logs created, they're created on
the OSD nodes, not on the master. But since the OSD deployment fails,
there probably are no OSD specific logs yet. So you'll have to take a
look into the syslog (/var/log/messages), that's where the salt-minion
reports its attempts to create the OSDs. Chances are high that you'll
find the root cause in here.

If the output is not enough, set the log-level to debug:

osd-1:~ # grep -E "^log_level" /etc/salt/minion
log_level: debug

Regards,
Eugen

Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:

> Hi Eugen.
>
> Sorry for the delay in answering.
>
> Just looked in the /var/log/ceph/ directory. It only contains the
following
> files (for example on node01):
>
> #######
> # ls -lart
> total 3864
> -rw------- 1 ceph ceph     904 ago 24 13:11 ceph.audit.log-20180829.xz
> drwxr-xr-x 1 root root     898 ago 28 10:07 ..
> -rw-r--r-- 1 ceph ceph  189464 ago 28 23:59
ceph-mon.node01.log-20180829.xz
> -rw------- 1 ceph ceph   24360 ago 28 23:59 ceph.log-20180829.xz
> -rw-r--r-- 1 ceph ceph   48584 ago 29 00:00
ceph-mgr.node01.log-20180829.xz
> -rw------- 1 ceph ceph       0 ago 29 00:00 ceph.audit.log
> drwxrws--T 1 ceph ceph     352 ago 29 00:00 .
> -rw-r--r-- 1 ceph ceph 1908122 ago 29 12:46 ceph-mon.node01.log
> -rw------- 1 ceph ceph  175229 ago 29 12:48 ceph.log
> -rw-r--r-- 1 ceph ceph 1599920 ago 29 12:49 ceph-mgr.node01.log
> #######
>
> So, it only contains logs concerning the node itself (is it correct?
sincer
> node01 is also the master, I was expecting it to have logs from the other
> too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have
> available, and nothing "shines out" (sorry for my poor english) as a
> possible error.
>
> Any suggestion on how to proceed?
>
> Thanks a lot in advance,
>
> Jones
>
>
> On Mon, Aug 27, 2018 at 5:29 AM Eugen Block <eblock@xxxxxx> wrote:
>
>> Hi Jones,
>>
>> all ceph logs are in the directory /var/log/ceph/, each daemon has its
>> own log file, e.g. OSD logs are named ceph-osd.*.
>>
>> I haven't tried it but I don't think SUSE Enterprise Storage deploys
>> OSDs on partitioned disks. Is there a way to attach a second disk to
>> the OSD nodes, maybe via USB or something?
>>
>> Although this thread is ceph related it is referring to a specific
>> product, so I would recommend to post your question in the SUSE forum
>> [1].
>>
>> Regards,
>> Eugen
>>
>> [1] https://forums.suse.com/forumdisplay.php?99-SUSE-Enterprise-Storage
>>
>> Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:
>>
>> > Hi Eugen.
>> >
>> > Thanks for the suggestion. I'll look for the logs (since it's our
first
>> > attempt with ceph, I'll have to discover where they are, but no
problem).
>> >
>> > One thing called my attention on your response however:
>> >
>> > I haven't made myself clear, but one of the failures we encountered
were
>> > that the files now containing:
>> >
>> > node02:
>> >    ----------
>> >    storage:
>> >        ----------
>> >        osds:
>> >            ----------
>> >            /dev/sda4:
>> >                ----------
>> >                format:
>> >                    bluestore
>> >                standalone:
>> >                    True
>> >
>> > Were originally empty, and we filled them by hand following a model
found
>> > elsewhere on the web. It was necessary, so that we could continue, but
>> the
>> > model indicated that, for example, it should have the path for
/dev/sda
>> > here, not /dev/sda4. We chosen to include the specific partition
>> > identification because we won't have dedicated disks here, rather just
>> the
>> > very same partition as all disks were partitioned exactly the same.
>> >
>> > While that was enough for the procedure to continue at that point,
now I
>> > wonder if it was the right call and, if it indeed was, if it was done
>> > properly.  As such, I wonder: what you mean by "wipe" the partition
here?
>> > /dev/sda4 is created, but is both empty and unmounted: Should a
different
>> > operation be performed on it, should I remove it first, should I have
>> > written the files above with only /dev/sda as target?
>> >
>> > I know that probably I wouldn't run in this issues with dedicated
discks,
>> > but unfortunately that is absolutely not an option.
>> >
>> > Thanks a lot in advance for any comments and/or extra suggestions.
>> >
>> > Sincerely yours,
>> >
>> > Jones
>> >
>> > On Sat, Aug 25, 2018 at 5:46 PM Eugen Block <eblock@xxxxxx> wrote:
>> >
>> >> Hi,
>> >>
>> >> take a look into the logs, they should point you in the right
direction.
>> >> Since the deployment stage fails at the OSD level, start with the OSD
>> >> logs. Something's not right with the disks/partitions, did you wipe
>> >> the partition from previous attempts?
>> >>
>> >> Regards,
>> >> Eugen
>> >>
>> >> Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:
>> >>
>> >>> (Please forgive my previous email: I was using another message and
>> >>> completely forget to update the subject)
>> >>>
>> >>> Hi all.
>> >>>
>> >>> I'm new to ceph, and after having serious problems in ceph stages
0, 1
>> >> and
>> >>> 2 that I could solve myself, now it seems that I have hit a wall
harder
>> >>> than my head. :)
>> >>>
>> >>> When I run salt-run state.orch ceph.stage.deploy, i monitor I see it
>> >> going
>> >>> up to here:
>> >>>
>> >>> #######
>> >>> [14/71]   ceph.sysctl on
>> >>>           node01....................................... ✓ (0.5s)
>> >>>           node02........................................ ✓ (0.7s)
>> >>>           node03....................................... ✓ (0.6s)
>> >>>           node04......................................... ✓ (0.5s)
>> >>>           node05....................................... ✓ (0.6s)
>> >>>           node06.......................................... ✓ (0.5s)
>> >>>
>> >>> [15/71]   ceph.osd on
>> >>>           node01...................................... ❌ (0.7s)
>> >>>           node02........................................ ❌ (0.7s)
>> >>>           node03....................................... ❌ (0.7s)
>> >>>           node04......................................... ❌ (0.6s)
>> >>>           node05....................................... ❌ (0.6s)
>> >>>           node06.......................................... ❌ (0.7s)
>> >>>
>> >>> Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71
time=624.7s
>> >>>
>> >>> Failures summary:
>> >>>
>> >>> ceph.osd (/srv/salt/ceph/osd):
>> >>>   node02:
>> >>>     deploy OSDs: Module function osd.deploy threw an exception.
>> >> Exception:
>> >>> Mine on node02 for cephdisks.list
>> >>>   node03:
>> >>>     deploy OSDs: Module function osd.deploy threw an exception.
>> >> Exception:
>> >>> Mine on node03 for cephdisks.list
>> >>>   node01:
>> >>>     deploy OSDs: Module function osd.deploy threw an exception.
>> >> Exception:
>> >>> Mine on node01 for cephdisks.list
>> >>>   node04:
>> >>>     deploy OSDs: Module function osd.deploy threw an exception.
>> >> Exception:
>> >>> Mine on node04 for cephdisks.list
>> >>>   node05:
>> >>>     deploy OSDs: Module function osd.deploy threw an exception.
>> >> Exception:
>> >>> Mine on node05 for cephdisks.list
>> >>>   node06:
>> >>>     deploy OSDs: Module function osd.deploy threw an exception.
>> >> Exception:
>> >>> Mine on node06 for cephdisks.list
>> >>> #######
>> >>>
>> >>> Since this is a first attempt in 6 simple test machines, we are
going
>> to
>> >>> put the mon, osds, etc, in all nodes at first. Only the master is
left
>> >> in a
>> >>> single machine (node01) by now.
>> >>>
>> >>> As they are simple machines, they have a single hdd, which is
>> partitioned
>> >>> as follows (the hda4 partition is unmounted and left for the ceph
>> >> system):
>> >>>
>> >>> ###########
>> >>> # lsblk
>> >>> NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>> >>> sda      8:0    0 465,8G  0 disk
>> >>> ├─sda1   8:1    0   500M  0 part /boot/efi
>> >>> ├─sda2   8:2    0    16G  0 part [SWAP]
>> >>> ├─sda3   8:3    0  49,3G  0 part /
>> >>> └─sda4   8:4    0   400G  0 part
>> >>> sr0     11:0    1   3,7G  0 rom
>> >>>
>> >>> # salt -I 'roles:storage' cephdisks.list
>> >>> node01:
>> >>> node02:
>> >>> node03:
>> >>> node04:
>> >>> node05:
>> >>> node06:
>> >>>
>> >>> # salt -I 'roles:storage' pillar.get ceph
>> >>> node02:
>> >>>     ----------
>> >>>     storage:
>> >>>         ----------
>> >>>         osds:
>> >>>             ----------
>> >>>             /dev/sda4:
>> >>>                 ----------
>> >>>                 format:
>> >>>                     bluestore
>> >>>                 standalone:
>> >>>                     True
>> >>> (and so on for all 6 machines)
>> >>> ##########
>> >>>
>> >>> Finally and just in case, my policy.cfg file reads:
>> >>>
>> >>> #########
>> >>> #cluster-unassigned/cluster/*.sls
>> >>> cluster-ceph/cluster/*.sls
>> >>> profile-default/cluster/*.sls
>> >>> profile-default/stack/default/ceph/minions/*yml
>> >>> config/stack/default/global.yml
>> >>> config/stack/default/ceph/cluster.yml
>> >>> role-master/cluster/node01.sls
>> >>> role-admin/cluster/*.sls
>> >>> role-mon/cluster/*.sls
>> >>> role-mgr/cluster/*.sls
>> >>> role-mds/cluster/*.sls
>> >>> role-ganesha/cluster/*.sls
>> >>> role-client-nfs/cluster/*.sls
>> >>> role-client-cephfs/cluster/*.sls
>> >>> ##########
>> >>>
>> >>> Please, could someone help me and shed some light on this issue?
>> >>>
>> >>> Thanks a lot in advance,
>> >>>
>> >>> Regasrds,
>> >>>
>> >>> Jones
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>>
>>
>>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com