Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Michel,

I do not notice anything strange in the logs files (looking for errors or warnings).

The hardware is a DELL C6100 sled (from 2011) running Alma Linux8 up-to-date. It uses 3 sata disks.

Is there a way to force osd installation by hand with providing the device /dev/sdc  for example ? A "do what I say" approach...

Is it a good try to deploy Octopus on the nodes, configure the osd (even if podman 4.2.0 is not validated for Octopus)  and then upgrade to Pacific? Could this be a workaround for this sort of regression from Octopus to Pacific ?

May be updating the BIOS from 1.7.1 to 1.8.1 ?


All this is a little bit confusing for me as I'm trying to discover Ceph 😁

Thanks

Patrick


Le 26/05/2023 à 17:19, Michel Jouvin a écrit :
Hi Patrick,

It is weird, we have a couple of clusters with cephadm and running pacify or quincy and ceph orch device works well. Have you looked at the cephadm logs (ceph log last cephadm)?

Except if you are using a very specific hardware, I suspect Ceph is suffering of a problem outside it...

Cheers,

Michel
Sent from my mobile

Le 26 mai 2023 17:02:50 Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx> a écrit :

Hi,

I'm back working on this problem.

First of all, I saw that I had a hardware memory error so I had to solve
this first. It's done.

I've tested some different Ceph deployments, each time starting with a
full OS re-install (it requires some time for each test).

Using Octopus, the devices are found:

    dnf -y install \
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm
    monip=$(getent ahostsv4 mostha1 |head -n 1| awk '{ print $1 }'))
    cephadm bootstrap --mon-ip $monip --initial-dashboard-password xxxxx \
                       --allow-fqdn-hostname

    [ceph: root@mostha1 /]# *ceph orch device ls*
    Hostname                      Path Type  Serial Size   Health
    Ident  Fault  Available
    mostha1.legi.grenoble-inp.fr  /dev/sda hdd S2B5J90ZA02494    250G
    Unknown  N/A    N/A    Yes
    mostha1.legi.grenoble-inp.fr  /dev/sdc hdd WD-WMAYP0982329   500G
    Unknown  N/A    N/A    Yes


But with Pacific or Quincy the command returns nothing.

With Pacific:

    dnf -y install \
https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noarch.rpm
    monip=$(getent ahostsv4 mostha1 |head -n 1| awk '{ print $1 }')
    cephadm bootstrap --mon-ip $monip --initial-dashboard-password xxxxx \
    --allow-fqdn-hostname


"ceph orch device ls" doesn't return anything but "cephadm shell lsmcli
ldl"  list all the devices.

    [ceph: root@mostha1 /]# *ceph orch device ls --wide*
    [ceph: root@mostha1 /]# *lsblk*
    NAME                 MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
    sda                    8:0    1 232.9G 0 disk
    |-sda1                 8:1    1   3.9G 0 part /rootfs/boot
    |-sda2                 8:2    1  78.1G 0 part
    | `-osvg-rootvol     253:0    0  48.8G 0 lvm  /rootfs
    |-sda3                 8:3    1   3.9G 0 part [SWAP]
    `-sda4                 8:4    1 146.9G 0 part
       |-secretvg-homevol 253:1    0   9.8G 0 lvm  /rootfs/home
       |-secretvg-tmpvol  253:2    0   9.8G 0 lvm  /rootfs/tmp
       `-secretvg-varvol  253:3    0   9.8G 0 lvm  /rootfs/var
    sdb                    8:16   1 232.9G 0 disk
    sdc                    8:32   1 465.8G 0 disk
    [ceph: root@mostha1 /]# exit
    [root@mostha1 ~]# *cephadm ceph-volume inventory*
    Inferring fsid 2e3e85a8-fbcf-11ed-84e5-00266cf8869c
    Using ceph image with id '0dc91bca92c2' and tag 'v17' created on
    2023-05-25 16:26:31 +0000 UTC
quay.io/ceph/ceph@sha256:b8df01a568f4dec7bac6d5040f9391dcca14e00ec7f4de8a3dcf3f2a6502d3a9

    Device Path               Size Device nodes    rotates
    available Model name

    [root@mostha1 ~]# *cephadm shell lsmcli ldl*
    Inferring fsid 4d54823c-fb05-11ed-aecf-00266cf8869c
    Inferring config
/var/lib/ceph/4d54823c-fb05-11ed-aecf-00266cf8869c/mon.mostha1/config
    Using ceph image with id 'c9a1062f7289' and tag 'v17' created on
    2023-04-25 16:04:33 +0000 UTC
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
    Path     | SCSI VPD 0x83    | Link Type | Serial Number   | Health
    Status
-------------------------------------------------------------------------
    */dev/sda | 50024e92039e4f1c | PATA/SATA | S2B5J90ZA10142  | Good**
    **/dev/sdc | 50014ee0ad5953c9 | PATA/SATA | WD-WMAYP0982329 | Good**
    **/dev/sdb | 50024e920387fa2c | PATA/SATA | S2B5J90ZA02494  | Good**
    *


Could it be a bug in ceph-volume ?
Adam suggest looking to the underlying commands (lsblk, blkid, udevadm,
lvs, or pvs) but I'm not very comfortable with blkid and udevadm. Is
there a "debug flag" to set ceph more verbose ?

Thanks

Patrick

Le 15/05/2023 à 21:20, Adam King a écrit :
As you've already seem to have figured out, "ceph orch device ls" is
populated with the results from "ceph-volume inventory". My best guess
to try and debug this would be to manually run "cephadm ceph-volume --
inventory" (the same as "cephadm ceph-volume inventory", I just like
to separate the ceph-volume command from cephadm itself with the " --
") and then check /var/log/ceph/<fsid>/ceph-volume.log from when you
ran the command onward to try and see why it isn't seeing your
devices. For example I can see a line like

[2023-05-15 19:11:58,048][ceph_volume.main][INFO  ] Running command:
ceph-volume  inventory

in there. Then if I look onward from there I can see it ran things like

lsblk -P -o
NAME,KNAME,PKNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL

as part of getting my device list. So if I was having issues I would
try running that directly and see what I got. Will note that
ceph-volume on certain more recent versions (not sure about octopus)
runs commands through nsenter, so you'd have to look past that part in
the log lines to the underlying command being used, typically
something with lsblk, blkid, udevadm, lvs, or pvs.

Also, if you want to see if it's an issue with a certain version of
ceph-volume, you can use different versions by passing the image flag
to cephadm. E.g.

cephadm --image quay.io/ceph/ceph:v17.2.6
<http://quay.io/ceph/ceph:v17.2.6> ceph-volume -- inventory

would use the 17.2.6 version of ceph-volume for the inventory. It
works by running ceph-volume through the container, so you don't have
to have to worry about installing different packages to try them and
it should pull the container image on its own if it isn't on the
machine already (but note that means the command will take longer as
it pulls the image the first time).



On Sat, May 13, 2023 at 4:34 AM Patrick Begou
<Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx> wrote:

Hi Joshua,

I've tried these commands but it looks like CEPH is unable to see and
configure these HDDs.
[root@mostha1 ~]# cephadm ceph-volume inventory

    Inferring fsid 4b7a6504-f0be-11ed-be1a-00266cf8869c
    Using recent ceph image
quay.io/ceph/ceph@sha256:e6919776f0ff8331a8e9c4b18d36c5e9eed31e1a80da62ae8454e42d10e95544
<http://quay.io/ceph/ceph@sha256:e6919776f0ff8331a8e9c4b18d36c5e9eed31e1a80da62ae8454e42d10e95544>

    Device Path               Size Device nodes rotates
    available Model name

[root@mostha1 ~]# cephadm shell

[ceph: root@mostha1 /]# ceph orch apply osd --all-available-devices

    Scheduled osd.all-available-devices update...

[ceph: root@mostha1 /]# ceph orch device ls[ceph: root@mostha1 /]#
ceph-volume lvm zap /dev/sdb

    --> Zapping: /dev/sdb
    --> --destroy was not specified, but zapping a whole device will
    remove the partition table
    Running command: /usr/bin/dd if=/dev/zero of=/dev/sdb bs=1M
count=10
    conv=fsync
      stderr: 10+0 records in
    10+0 records out
    10485760 bytes (10 MB, 10 MiB) copied, 0.10039 s, 104 MB/s
    --> Zapping successful for: <Raw Device: /dev/sdb>

I can check that /dev/sdb1 has been erased, so previous command is
successful
[ceph: root@mostha1 ceph]# lsblk
NAME                 MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                    8:0    1 232.9G  0 disk
|-sda1                 8:1    1   3.9G  0 part /rootfs/boot
|-sda2                 8:2    1  78.1G  0 part
| `-osvg-rootvol     253:0    0  48.8G  0 lvm  /rootfs
|-sda3                 8:3    1   3.9G  0 part [SWAP]
`-sda4                 8:4    1 146.9G  0 part
   |-secretvg-homevol 253:1    0   9.8G  0 lvm  /rootfs/home
   |-secretvg-tmpvol  253:2    0   9.8G  0 lvm  /rootfs/tmp
   `-secretvg-varvol  253:3    0   9.8G  0 lvm  /rootfs/var
sdb                    8:16   1 465.8G  0 disk
sdc                    8:32   1 232.9G  0 disk

But still no visible HDD:

[ceph: root@mostha1 ceph]# ceph orch apply osd --all-available-devices

    Scheduled osd.all-available-devices update...

[ceph: root@mostha1 ceph]# ceph orch device ls
[ceph: root@mostha1 ceph]#

May be I have done something bad at install time as in the container
I've unintentionally run:

dnf -y install
https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noarch.rpm

(an awful copy/paste launching the command). Can this break The
container ? I do not know what should be available as ceph
packages in
the container to remove properly this install (no dnf.log file in the
container)

Patrick


Le 12/05/2023 à 21:38, Beaman, Joshua a écrit :
The most significant point I see there, is you have no OSD service
spec to tell orchestrator how to deploy OSDs.  The easiest fix for
that would be “cephorchapplyosd--all-available-devices”

This will create a simple spec that should work for a test
environment.  Most likely it will collocate the block, block.db,
and
WAL all on the same device.  Not ideal for prod environments,
but fine
for practice and testing.

The other command I should have had you try is “cephadm ceph-volume
inventory”.  That should show you the devices available for OSD
deployment, and hopefully matches up to what your “lsblk”
shows.  If
you need to zap HDDs and orchestrator is still not seeing them, you
can try “cephadm ceph-volume lvm zap /dev/sdb”

Thank you,

Josh Beaman

*From: *Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>
*Date: *Friday, May 12, 2023 at 2:22 PM
*To: *Beaman, Joshua <Joshua_Beaman@xxxxxxxxxxx>, ceph-users
<ceph-users@xxxxxxx>
*Subject: *Re: [EXTERNAL]  [Pacific] ceph orch
device ls
do not returns any HDD

Hi Joshua and thanks for this quick reply.

At this step I have only one node. I was checking what ceph was
returning with different commands on this host before adding new
hosts. Just to compare with my first Octopus install. As this
hardware
is for testing only, it remains easy for me to break everything and
reinstall again.

[root@mostha1 ~]# cephadm check-host

     podman (/usr/bin/podman) version 4.2.0 is present
     systemctl is present
     lvcreate is present
     Unit chronyd.service is enabled and running
     Host looks OK

[ceph: root@mostha1 /]# ceph -s

       cluster:
         id: 4b7a6504-f0be-11ed-be1a-00266cf8869c
         health: HEALTH_WARN
                 OSD count 0 < osd_pool_default_size 3

       services:
         mon: 1 daemons, quorum mostha1.legi.grenoble-inp.fr
<http://mostha1.legi.grenoble-inp.fr> (age 5h)
         mgr: mostha1.legi.grenoble-inp.fr
<http://mostha1.legi.grenoble-inp.fr>.hogwuz(active, since 5h)
         osd: 0 osds: 0 up, 0 in

       data:
         pools:   0 pools, 0 pgs
         objects: 0 objects, 0 B
         usage:   0 B used, 0 B / 0 B avail
         pgs:

[ceph: root@mostha1 /]# ceph orch ls

     NAME           PORTS        RUNNING REFRESHED  AGE PLACEMENT
     alertmanager   ?:9093,9094 1/1  6m ago     6h count:1
     crash 1/1  6m ago     6h *
     grafana        ?:3000 1/1  6m ago     6h count:1
     mgr 1/2  6m ago     6h count:2
     mon 1/5  6m ago     6h count:5
     node-exporter  ?:9100 1/1  6m ago     6h *
     prometheus     ?:9095 1/1  6m ago     6h count:1

[ceph: root@mostha1 /]# ceph orch ls osd -export

     No services reported

[ceph: root@mostha1 /]# ceph orch host ls

     HOST                          ADDR LABELS  STATUS
mostha1.legi.grenoble-inp.fr
<http://mostha1.legi.grenoble-inp.fr> 194.254.66.34 _admin
     1 hosts in cluster

[ceph: root@mostha1 /]# ceph log last cephadm

     ...
     2023-05-12T15:19:58.754655+0000
 mgr.mostha1.legi.grenoble-inp.fr.hogwuz (mgr.44098) 1876 :
cephadm
     [INF] Zap device mostha1.legi.grenoble-inp.fr:/dev/sdb
     2023-05-12T15:19:58.756639+0000
 mgr.mostha1.legi.grenoble-inp.fr.hogwuz (mgr.44098) 1877 :
cephadm
     [ERR] Device path '/dev/sdb' not found on host
     'mostha1.legi.grenoble-inp.fr
<http://mostha1.legi.grenoble-inp.fr>'
     Traceback (most recent call last):
       File "/usr/share/ceph/mgr/orchestrator/_interface.py",
line 125,
     in wrapper
         return OrchResult(f(*args, **kwargs))
       File "/usr/share/ceph/mgr/cephadm/module.py", line 2275, in
     zap_device
         f"Device path '{path}' not found on host '{host}'")
 orchestrator._interface.OrchestratorError: Device path
'/dev/sdb'
     not found on host 'mostha1.legi.grenoble-inp.fr
<http://mostha1.legi.grenoble-inp.fr>'
     ....

[ceph: root@mostha1 /]# ls -l /dev/sdb

     brw-rw---- 1 root disk 8, 16 May 12 15:16 /dev/sdb

[ceph: root@mostha1 /]# lsblk /dev/sdb

     NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
     sdb      8:16   1 465.8G  0 disk
     `-sdb1   8:17   1 465.8G  0 part

I have crated a full partition on /dev/sdb (for testing) and
/dev/sdc
has no partition table (removed).

But all seams fine with these commands.

Patrick

Le 12/05/2023 à 20:19, Beaman, Joshua a écrit :

     I don’t quite understand why that zap would not work.  But,
here’s
     where I’d start.

      1. cephadm check-host

          1. Run this on each of your hosts to make sure cephadm,
             podman and all other prerequisites are installed and
             recognized

      2. ceph orch ls

          1. This should show at least a mon, mgr, and osd spec
deployed

      3. ceph orch ls osd –export

          1. This will show the OSD placement service specifications
             that orchestrator uses to identify devices to deploy
as OSDs

      4. ceph orch host ls

          1. This will list the hosts that have been added to
             orchestrator’s inventory, and what labels are applied
             which correlate to the service placement labels

      5. ceph log last cephadm

          1. This will show you what orchestrator has been trying to
             do, and how it may be failing

     Also, it’s never un-helpful to have a look at “ceph -s” and
“ceph
     health detail”, particularly for any people trying to help you
     without access to your systems.

     Best of luck,

     Josh Beaman

     *From: *Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>
<mailto:Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>
     *Date: *Friday, May 12, 2023 at 10:45 AM
     *To: *ceph-users <ceph-users@xxxxxxx>
<mailto:ceph-users@xxxxxxx>
     *Subject: *[EXTERNAL]  [Pacific] ceph orch device ls
     do not returns any HDD

     Hi everyone

     I'm new to CEPH, just a french 4 days training session with
     Octopus on
     VMs that convince me to build my first cluster.

     At this time I have 4 old identical nodes for testing with 3
HDDs
     each,
     2 network interfaces and running Alma Linux8 (el8). I try to
     replay the
     training session but it fails, breaking the web interface
because of
     some problems with podman 4.2 not compatible with Octopus.

     So I try to deploy Pacific with cephadm tool on my first node
     (mostha1)
     (to enable testing also an upgrade later).

         dnf -y install

https://urldefense.com/v3/__https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noarch.rpm__;!!CQl3mcHX2A!H9cwNCJyKXYQ4BbGA3gwHHRitjOS4lBCZT9wlnBZ-8IDue0MvdcPD8Dnv5yQCZw_eA4BNDYaEq1eouKQcQO7HshgdUJ0SJ-EgLfaBGBmCQ$
<https://urldefense.com/v3/__https:/download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noarch.rpm__;!!CQl3mcHX2A!H9cwNCJyKXYQ4BbGA3gwHHRitjOS4lBCZT9wlnBZ-8IDue0MvdcPD8Dnv5yQCZw_eA4BNDYaEq1eouKQcQO7HshgdUJ0SJ-EgLfaBGBmCQ$>


         monip=$(getent ahostsv4 mostha1 |head -n 1| awk '{ print
$1 }')
         cephadm bootstrap --mon-ip $monip
--initial-dashboard-password
     xxxxx \
--initial-dashboard-user admceph \
--allow-fqdn-hostname --cluster-network
10.1.0.0/16 <http://10.1.0.0/16>

     This was sucessfull.

     But running "*c**eph orch device ls*" do not show any HDD
even if
     I have
     /dev/sda (used by the OS), /dev/sdb and /dev/sdc

     The web interface shows a row capacity which is an aggregate
of the
     sizes of the 3 HDDs for the node.

     I've also tried to reset /dev/sdb but cephadm do not see it:

         [ceph: root@mostha1 /]# ceph orch device zap
mostha1.legi.grenoble-inp.fr
<http://mostha1.legi.grenoble-inp.fr> /dev/sdb --force
         Error EINVAL: Device path '/dev/sdb' not found on host
         'mostha1.legi.grenoble-inp.fr
<http://mostha1.legi.grenoble-inp.fr>'

     On my first attempt with octopus, I was able to list the
available
     HDD
     with this command line. Before moving to Pacific, the OS on
this node
     has been reinstalled from scratch.

     Any advices for a CEPH beginner ?

     Thanks

     Patrick
 _______________________________________________
     ceph-users mailing list -- ceph-users@xxxxxxx
     To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux