Re: Ceph-Deploy error on 15/71 stage

Jones de Andrade <johannesrs@xxxxxxxxx> · Wed, 29 Aug 2018 12:55:55 -0300

Hi Eugen.

Sorry for the delay in answering.

Just looked in the /var/log/ceph/ directory. It only contains the following files (for example on node01):

#######
# ls -lart
total 3864
-rw------- 1 ceph ceph     904 ago 24 13:11 ceph.audit.log-20180829.xz
drwxr-xr-x 1 root root     898 ago 28 10:07 ..
-rw-r--r-- 1 ceph ceph  189464 ago 28 23:59 ceph-mon.node01.log-20180829.xz
-rw------- 1 ceph ceph   24360 ago 28 23:59 ceph.log-20180829.xz
-rw-r--r-- 1 ceph ceph   48584 ago 29 00:00 ceph-mgr.node01.log-20180829.xz
-rw------- 1 ceph ceph       0 ago 29 00:00 ceph.audit.log
drwxrws--T 1 ceph ceph     352 ago 29 00:00 .
-rw-r--r-- 1 ceph ceph 1908122 ago 29 12:46 ceph-mon.node01.log
-rw------- 1 ceph ceph  175229 ago 29 12:48 ceph.log
-rw-r--r-- 1 ceph ceph 1599920 ago 29 12:49 ceph-mgr.node01.log
#######

So, it only contains logs concerning the node itself (is it correct? sincer node01 is also the master, I was expecting it to have logs from the other too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have available, and nothing "shines out" (sorry for my poor english) as a possible error.

Any suggestion on how to proceed?

Thanks a lot in advance,

Jones

On Mon, Aug 27, 2018 at 5:29 AM Eugen Block <eblock@xxxxxx> wrote:
Hi Jones,

all ceph logs are in the directory /var/log/ceph/, each daemon has its  

own log file, e.g. OSD logs are named ceph-osd.*.

I haven't tried it but I don't think SUSE Enterprise Storage deploys  

OSDs on partitioned disks. Is there a way to attach a second disk to  

the OSD nodes, maybe via USB or something?

Although this thread is ceph related it is referring to a specific  

product, so I would recommend to post your question in the SUSE forum  

[1].

Regards,

Eugen

[1] https://forums.suse.com/forumdisplay.php?99-SUSE-Enterprise-Storage

Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:

> Hi Eugen.

>

> Thanks for the suggestion. I'll look for the logs (since it's our first

> attempt with ceph, I'll have to discover where they are, but no problem).

>

> One thing called my attention on your response however:

>

> I haven't made myself clear, but one of the failures we encountered were

> that the files now containing:

>

> node02:

>    ----------

>    storage:

>        ----------

>        osds:

>            ----------

>            /dev/sda4:

>                ----------

>                format:

>                    bluestore

>                standalone:

>                    True

>

> Were originally empty, and we filled them by hand following a model found

> elsewhere on the web. It was necessary, so that we could continue, but the

> model indicated that, for example, it should have the path for /dev/sda

> here, not /dev/sda4. We chosen to include the specific partition

> identification because we won't have dedicated disks here, rather just the

> very same partition as all disks were partitioned exactly the same.

>

> While that was enough for the procedure to continue at that point, now I

> wonder if it was the right call and, if it indeed was, if it was done

> properly.  As such, I wonder: what you mean by "wipe" the partition here?

> /dev/sda4 is created, but is both empty and unmounted: Should a different

> operation be performed on it, should I remove it first, should I have

> written the files above with only /dev/sda as target?

>

> I know that probably I wouldn't run in this issues with dedicated discks,

> but unfortunately that is absolutely not an option.

>

> Thanks a lot in advance for any comments and/or extra suggestions.

>

> Sincerely yours,

>

> Jones

>

> On Sat, Aug 25, 2018 at 5:46 PM Eugen Block <eblock@xxxxxx> wrote:

>

>> Hi,

>>

>> take a look into the logs, they should point you in the right direction.

>> Since the deployment stage fails at the OSD level, start with the OSD

>> logs. Something's not right with the disks/partitions, did you wipe

>> the partition from previous attempts?

>>

>> Regards,

>> Eugen

>>

>> Zitat von Jones de Andrade <johannesrs@xxxxxxxxx>:

>>

>>> (Please forgive my previous email: I was using another message and

>>> completely forget to update the subject)

>>>

>>> Hi all.

>>>

>>> I'm new to ceph, and after having serious problems in ceph stages 0, 1

>> and

>>> 2 that I could solve myself, now it seems that I have hit a wall harder

>>> than my head. :)

>>>

>>> When I run salt-run state.orch ceph.stage.deploy, i monitor I see it

>> going

>>> up to here:

>>>

>>> #######

>>> [14/71]   ceph.sysctl on

>>>           node01....................................... ✓ (0.5s)

>>>           node02........................................ ✓ (0.7s)

>>>           node03....................................... ✓ (0.6s)

>>>           node04......................................... ✓ (0.5s)

>>>           node05....................................... ✓ (0.6s)

>>>           node06.......................................... ✓ (0.5s)

>>>

>>> [15/71]   ceph.osd on

>>>           node01...................................... ❌ (0.7s)

>>>           node02........................................ ❌ (0.7s)

>>>           node03....................................... ❌ (0.7s)

>>>           node04......................................... ❌ (0.6s)

>>>           node05....................................... ❌ (0.6s)

>>>           node06.......................................... ❌ (0.7s)

>>>

>>> Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s

>>>

>>> Failures summary:

>>>

>>> ceph.osd (/srv/salt/ceph/osd):

>>>   node02:

>>>     deploy OSDs: Module function osd.deploy threw an exception.

>> Exception:

>>> Mine on node02 for cephdisks.list

>>>   node03:

>>>     deploy OSDs: Module function osd.deploy threw an exception.

>> Exception:

>>> Mine on node03 for cephdisks.list

>>>   node01:

>>>     deploy OSDs: Module function osd.deploy threw an exception.

>> Exception:

>>> Mine on node01 for cephdisks.list

>>>   node04:

>>>     deploy OSDs: Module function osd.deploy threw an exception.

>> Exception:

>>> Mine on node04 for cephdisks.list

>>>   node05:

>>>     deploy OSDs: Module function osd.deploy threw an exception.

>> Exception:

>>> Mine on node05 for cephdisks.list

>>>   node06:

>>>     deploy OSDs: Module function osd.deploy threw an exception.

>> Exception:

>>> Mine on node06 for cephdisks.list

>>> #######

>>>

>>> Since this is a first attempt in 6 simple test machines, we are going to

>>> put the mon, osds, etc, in all nodes at first. Only the master is left

>> in a

>>> single machine (node01) by now.

>>>

>>> As they are simple machines, they have a single hdd, which is partitioned

>>> as follows (the hda4 partition is unmounted and left for the ceph

>> system):

>>>

>>> ###########

>>> # lsblk

>>> NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT

>>> sda      8:0    0 465,8G  0 disk

>>> ├─sda1   8:1    0   500M  0 part /boot/efi

>>> ├─sda2   8:2    0    16G  0 part [SWAP]

>>> ├─sda3   8:3    0  49,3G  0 part /

>>> └─sda4   8:4    0   400G  0 part

>>> sr0     11:0    1   3,7G  0 rom

>>>

>>> # salt -I 'roles:storage' cephdisks.list

>>> node01:

>>> node02:

>>> node03:

>>> node04:

>>> node05:

>>> node06:

>>>

>>> # salt -I 'roles:storage' pillar.get ceph

>>> node02:

>>>     ----------

>>>     storage:

>>>         ----------

>>>         osds:

>>>             ----------

>>>             /dev/sda4:

>>>                 ----------

>>>                 format:

>>>                     bluestore

>>>                 standalone:

>>>                     True

>>> (and so on for all 6 machines)

>>> ##########

>>>

>>> Finally and just in case, my policy.cfg file reads:

>>>

>>> #########

>>> #cluster-unassigned/cluster/*.sls

>>> cluster-ceph/cluster/*.sls

>>> profile-default/cluster/*.sls

>>> profile-default/stack/default/ceph/minions/*yml

>>> config/stack/default/global.yml

>>> config/stack/default/ceph/cluster.yml

>>> role-master/cluster/node01.sls

>>> role-admin/cluster/*.sls

>>> role-mon/cluster/*.sls

>>> role-mgr/cluster/*.sls

>>> role-mds/cluster/*.sls

>>> role-ganesha/cluster/*.sls

>>> role-client-nfs/cluster/*.sls

>>> role-client-cephfs/cluster/*.sls

>>> ##########

>>>

>>> Please, could someone help me and shed some light on this issue?

>>>

>>> Thanks a lot in advance,

>>>

>>> Regasrds,

>>>

>>> Jones

>>

>>

>>

>> _______________________________________________

>> ceph-users mailing list

>> ceph-users@xxxxxxxxxxxxxx

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com