Re: Ceph-Deploy error on 15/71 stage

Eugen Block <eblock@xxxxxx> · Thu, 30 Aug 2018 06:58:27 +0000

Hi,

So, it only contains logs concerning the node itself (is it correct? sincer
node01 is also the master, I was expecting it to have logs from the other
too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have
available, and nothing "shines out" (sorry for my poor english) as a
possible error.

the logging is not configured to be centralised per default, you would  
have to configure that yourself.

Regarding the OSDs, if there are OSD logs created, they're created on  
the OSD nodes, not on the master. But since the OSD deployment fails,  
there probably are no OSD specific logs yet. So you'll have to take a  
look into the syslog (/var/log/messages), that's where the salt-minion  
reports its attempts to create the OSDs. Chances are high that you'll  
find the root cause in here.

If the output is not enough, set the log-level to debug:

osd-1:~ # grep -E "^log_level" /etc/salt/minion
log_level: debug

Regards,
Eugen

Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:

Hi Eugen.

Sorry for the delay in answering.

Just looked in the /var/log/ceph/ directory. It only contains the following
files (for example on node01):

#######
# ls -lart
total 3864
-rw------- 1 ceph ceph     904 ago 24 13:11 ceph.audit.log-20180829.xz
drwxr-xr-x 1 root root     898 ago 28 10:07 ..
-rw-r--r-- 1 ceph ceph  189464 ago 28 23:59 ceph-mon.node01.log-20180829.xz
-rw------- 1 ceph ceph   24360 ago 28 23:59 ceph.log-20180829.xz
-rw-r--r-- 1 ceph ceph   48584 ago 29 00:00 ceph-mgr.node01.log-20180829.xz
-rw------- 1 ceph ceph       0 ago 29 00:00 ceph.audit.log
drwxrws--T 1 ceph ceph     352 ago 29 00:00 .
-rw-r--r-- 1 ceph ceph 1908122 ago 29 12:46 ceph-mon.node01.log
-rw------- 1 ceph ceph  175229 ago 29 12:48 ceph.log
-rw-r--r-- 1 ceph ceph 1599920 ago 29 12:49 ceph-mgr.node01.log
#######

So, it only contains logs concerning the node itself (is it correct? sincer
node01 is also the master, I was expecting it to have logs from the other
too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have
available, and nothing "shines out" (sorry for my poor english) as a
possible error.

Any suggestion on how to proceed?

Thanks a lot in advance,

Jones

On Mon, Aug 27, 2018 at 5:29 AM Eugen Block <eblock@xxxxxx> wrote:

Hi Jones,

all ceph logs are in the directory /var/log/ceph/, each daemon has its
own log file, e.g. OSD logs are named ceph-osd.*.

I haven't tried it but I don't think SUSE Enterprise Storage deploys
OSDs on partitioned disks. Is there a way to attach a second disk to
the OSD nodes, maybe via USB or something?

Although this thread is ceph related it is referring to a specific
product, so I would recommend to post your question in the SUSE forum
[1].

Regards,
Eugen

[1] https://forums.suse.com/forumdisplay.php?99-SUSE-Enterprise-Storage

Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:

> Hi Eugen.
>
> Thanks for the suggestion. I'll look for the logs (since it's our first
> attempt with ceph, I'll have to discover where they are, but no problem).
>
> One thing called my attention on your response however:
>
> I haven't made myself clear, but one of the failures we encountered were
> that the files now containing:
>
> node02:
>    ----------
>    storage:
>        ----------
>        osds:
>            ----------
>            /dev/sda4:
>                ----------
>                format:
>                    bluestore
>                standalone:
>                    True
>
> Were originally empty, and we filled them by hand following a model found
> elsewhere on the web. It was necessary, so that we could continue, but
the
> model indicated that, for example, it should have the path for /dev/sda
> here, not /dev/sda4. We chosen to include the specific partition
> identification because we won't have dedicated disks here, rather just
the
> very same partition as all disks were partitioned exactly the same.
>
> While that was enough for the procedure to continue at that point, now I
> wonder if it was the right call and, if it indeed was, if it was done
> properly.  As such, I wonder: what you mean by "wipe" the partition here?
> /dev/sda4 is created, but is both empty and unmounted: Should a different
> operation be performed on it, should I remove it first, should I have
> written the files above with only /dev/sda as target?
>
> I know that probably I wouldn't run in this issues with dedicated discks,
> but unfortunately that is absolutely not an option.
>
> Thanks a lot in advance for any comments and/or extra suggestions.
>
> Sincerely yours,
>
> Jones
>
> On Sat, Aug 25, 2018 at 5:46 PM Eugen Block <eblock@xxxxxx> wrote:
>
>> Hi,
>>
>> take a look into the logs, they should point you in the right direction.
>> Since the deployment stage fails at the OSD level, start with the OSD
>> logs. Something's not right with the disks/partitions, did you wipe
>> the partition from previous attempts?
>>
>> Regards,
>> Eugen
>>
>> Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:
>>
>>> (Please forgive my previous email: I was using another message and
>>> completely forget to update the subject)
>>>
>>> Hi all.
>>>
>>> I'm new to ceph, and after having serious problems in ceph stages 0, 1
>> and
>>> 2 that I could solve myself, now it seems that I have hit a wall harder
>>> than my head. :)
>>>
>>> When I run salt-run state.orch ceph.stage.deploy, i monitor I see it
>> going
>>> up to here:
>>>
>>> #######
>>> [14/71]   ceph.sysctl on
>>>           node01....................................... ✓ (0.5s)
>>>           node02........................................ ✓ (0.7s)
>>>           node03....................................... ✓ (0.6s)
>>>           node04......................................... ✓ (0.5s)
>>>           node05....................................... ✓ (0.6s)
>>>           node06.......................................... ✓ (0.5s)
>>>
>>> [15/71]   ceph.osd on
>>>           node01...................................... ❌ (0.7s)
>>>           node02........................................ ❌ (0.7s)
>>>           node03....................................... ❌ (0.7s)
>>>           node04......................................... ❌ (0.6s)
>>>           node05....................................... ❌ (0.6s)
>>>           node06.......................................... ❌ (0.7s)
>>>
>>> Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s
>>>
>>> Failures summary:
>>>
>>> ceph.osd (/srv/salt/ceph/osd):
>>>   node02:
>>>     deploy OSDs: Module function osd.deploy threw an exception.
>> Exception:
>>> Mine on node02 for cephdisks.list
>>>   node03:
>>>     deploy OSDs: Module function osd.deploy threw an exception.
>> Exception:
>>> Mine on node03 for cephdisks.list
>>>   node01:
>>>     deploy OSDs: Module function osd.deploy threw an exception.
>> Exception:
>>> Mine on node01 for cephdisks.list
>>>   node04:
>>>     deploy OSDs: Module function osd.deploy threw an exception.
>> Exception:
>>> Mine on node04 for cephdisks.list
>>>   node05:
>>>     deploy OSDs: Module function osd.deploy threw an exception.
>> Exception:
>>> Mine on node05 for cephdisks.list
>>>   node06:
>>>     deploy OSDs: Module function osd.deploy threw an exception.
>> Exception:
>>> Mine on node06 for cephdisks.list
>>> #######
>>>
>>> Since this is a first attempt in 6 simple test machines, we are going
to
>>> put the mon, osds, etc, in all nodes at first. Only the master is left
>> in a
>>> single machine (node01) by now.
>>>
>>> As they are simple machines, they have a single hdd, which is
partitioned
>>> as follows (the hda4 partition is unmounted and left for the ceph
>> system):
>>>
>>> ###########
>>> # lsblk
>>> NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>>> sda      8:0    0 465,8G  0 disk
>>> ├─sda1   8:1    0   500M  0 part /boot/efi
>>> ├─sda2   8:2    0    16G  0 part [SWAP]
>>> ├─sda3   8:3    0  49,3G  0 part /
>>> └─sda4   8:4    0   400G  0 part
>>> sr0     11:0    1   3,7G  0 rom
>>>
>>> # salt -I 'roles:storage' cephdisks.list
>>> node01:
>>> node02:
>>> node03:
>>> node04:
>>> node05:
>>> node06:
>>>
>>> # salt -I 'roles:storage' pillar.get ceph
>>> node02:
>>>     ----------
>>>     storage:
>>>         ----------
>>>         osds:
>>>             ----------
>>>             /dev/sda4:
>>>                 ----------
>>>                 format:
>>>                     bluestore
>>>                 standalone:
>>>                     True
>>> (and so on for all 6 machines)
>>> ##########
>>>
>>> Finally and just in case, my policy.cfg file reads:
>>>
>>> #########
>>> #cluster-unassigned/cluster/*.sls
>>> cluster-ceph/cluster/*.sls
>>> profile-default/cluster/*.sls
>>> profile-default/stack/default/ceph/minions/*yml
>>> config/stack/default/global.yml
>>> config/stack/default/ceph/cluster.yml
>>> role-master/cluster/node01.sls
>>> role-admin/cluster/*.sls
>>> role-mon/cluster/*.sls
>>> role-mgr/cluster/*.sls
>>> role-mds/cluster/*.sls
>>> role-ganesha/cluster/*.sls
>>> role-client-nfs/cluster/*.sls
>>> role-client-cephfs/cluster/*.sls
>>> ##########
>>>
>>> Please, could someone help me and shed some light on this issue?
>>>
>>> Thanks a lot in advance,
>>>
>>> Regasrds,
>>>
>>> Jones
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com