Re: cephadm picks development/latest tagged image for daemon-base (docker.io/ceph/daemon-base:latest-pacific-devel)

Arun Vinod <arunvinod.tech@xxxxxxxxx> · Fri, 4 Feb 2022 01:02:03 +0530

Hi Adam,

Thanks for the update. In that case this looks like a bug like you
mentioned.

Here are the contents of the config file used for bootstrapping.

[global]

osd pool default size = 2

osd pool default min size = 1

osd pool default pg num = 8

osd pool default pgp num = 8

osd recovery delay start = 60

osd memory target = 1610612736

osd failsafe full ratio = 1.0

mon pg warn max object skew = 20

mon osd nearfull ratio = 0.8

mon osd backfillfull ratio = 0.87

mon osd full ratio = 0.95

mon max pg per osd = 400

debug asok = 0/0

debug auth = 0/0

debug buffer = 0/0

debug client = 0/0

debug context = 0/0

debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0
[osd]
bluestore compression mode = passive
[mon]
mon osd allow primary affinity = true
mon allow pool delete = true
[client]
rbd cache = true
rbd cache writethrough until flush = true
rbd concurrent management ops = 20
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
log file = /var/log/ceph/client.$pid.log

Output of bootstrap command:

[root@hcictrl01 stack_orchestrator]# sudo cephadm --image
quay.io/ceph/ceph:v16.2.7 bootstrap --skip-monitoring-stack --mon-ip
10.175.41.11 --clus
ter-network 10.175.42.0/24 --ssh-user ceph_deploy --ssh-private-key
/home/ceph_deploy/.ssh/id_rsa --ssh-public-key
/home/ceph_deploy/.ssh/id_rsa.p
ub --config /home/ceph_deploy/ceph_bootstrap/ceph.conf
--initial-dashboard-password J959ABCFRFGE --dashboard-password-noupdate
--no-minimize-confi
g --skip-pull

Verifying podman|docker is present...

Verifying lvm2 is present...

Verifying time synchronization is in place...

Unit chronyd.service is enabled and running

Repeating the final host check...

podman (/bin/podman) version 3.3.1 is present

systemctl is present

lvcreate is present

Unit chronyd.service is enabled and running

Host looks OK

Cluster fsid: dba72000-8525-11ec-b1e7-0015171590ba

Verifying IP 10.175.41.11 port 3300 ...

Verifying IP 10.175.41.11 port 6789 ...

Mon IP `10.175.41.11` is in CIDR network `10.175.41.0/24`

Ceph version: ceph version 16.2.7
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)

Extracting ceph user uid/gid from container image...

Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Setting mon public_network to 10.175.41.0/24
Setting cluster_network to 10.175.42.0/24
Wrote config to /etc/ceph/ceph.conf
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Creating mgr...
Verifying port 9283 ...
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr not available, waiting (3/15)...
mgr not available, waiting (4/15)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Using provided ssh keys...
Adding host hcictrl01...
Deploying mon service with default placement...
Deploying mgr service with default placement...
Deploying crash service with default placement...
Enabling the dashboard module...
Waiting for the mgr to restart...
Waiting for mgr epoch 9...
mgr epoch 9 is available
Generating a dashboard self-signed certificate...
Creating initial admin user...
Fetching dashboard port number...
Ceph Dashboard is now available at:

             URL: https://hcictrl01.enclouden.com:8443/
            User: admin
        Password: J959ABCFRFGE

Enabling client.admin keyring and conf on hosts with "admin" label
You can access the Ceph CLI with:

        sudo /sbin/cephadm shell --fsid
dba72000-8525-11ec-b1e7-0015171590ba -c /etc/ceph/ceph.conf -k
/etc/ceph/ceph.client.admin.keyring

Please consider enabling telemetry to help improve Ceph:

        ceph telemetry on

For more information see:

        https://docs.ceph.com/docs/pacific/mgr/telemetry/

Bootstrap complete.

List of containers created after bootstrap:

[root@hcictrl01 stack_orchestrator]# podman ps
CONTAINER ID  IMAGE                                            COMMAND
          CREATED             STATUS                 PORTS       NAMES
c7bfdf3b5831  quay.io/ceph/ceph:v16.2.7                        -n
mon.hcictrl01 ...  7 minutes ago       Up 7 minutes ago
ceph-dba72000-8525-11ec-b1e7-0015171590ba-mon-hcictrl01
67c1e6f2ff1f  quay.io/ceph/ceph:v16.2.7                        -n
mgr.hcictrl01....  7 minutes ago       Up 7 minutes ago
ceph-dba72000-8525-11ec-b1e7-0015171590ba-mgr-hcictrl01-fvopfn
6e87fba9235d  docker.io/ceph/daemon-base:latest-pacific-devel  -n
client.crash.h...  About a minute ago  Up About a minute ago
 ceph-dba72000-8525-11ec-b1e7-0015171590ba-crash-hcictrl01

[root@hcictrl01 stack_orchestrator]# ceph orch ps
NAME                  HOST       PORTS   STATUS         REFRESHED  AGE  MEM
USE  MEM LIM  VERSION               IMAGE ID      CONTAINER ID
crash.hcictrl01       hcictrl01          running (87s)    83s ago  87s
 6975k        -  16.2.5-387-g7282d81d  41387741ad94  6e87fba9235d
mgr.hcictrl01.fvopfn  hcictrl01  *:9283  running (7m)     83s ago   7m
399M        -  16.2.7                231fd40524c4  67c1e6f2ff1f
mon.hcictrl01         hcictrl01          running (8m)     83s ago   8m
 45.4M    2048M  16.2.7                231fd40524c4  c7bfdf3b5831

[root@hcictrl01 stack_orchestrator]# podman images
REPOSITORY                  TAG                   IMAGE ID      CREATED
  SIZE
quay.io/ceph/ceph           v16.2.7               231fd40524c4  2 days ago
   1.39 GB
docker.io/ceph/daemon-base  latest-pacific-devel  41387741ad94  5 months
ago  1.23 GB

As you can see the crash daemon is getting created on the image '
docker.io/ceph/daemon-base:latest-pacific-devel'  and it's not respecting
the --image flag provided. Also, we are not setting any  config elsewhere
other than the bootstrap conf file.

I have also attached the full log of cephadm, hope you can view it from
email. Let me know if you need any further data.

Thanks in advance

Regards,
Arun Vinod

On Fri, 4 Feb 2022 at 00:17, Adam King <adking@xxxxxxxxxx> wrote:

> But, even if I gave --image flag with bootstrap the daemons created by mgr
>> module are using the daemon-base image, in our case its '
>> docker.io/ceph/daemon-base:latest-pacific-devel'.
>> Which I guess is because, mgr daemon takes into consideration the
>> configuration parameter 'container_image', whose default value is '
>> docker.io/ceph/daemon-base:latest-pacific-devel'.
>> What we guess is even if we provide --image flag in cephadm bootstrap,
>> cephadm is not updating the variable container_image with this value.
>> Hence, all the remaining daemons are getting created using
>> daemon-base image.
>
>
> This is not how it's supposed to work. If you provide "--image
> <image-name>" to bootstrap all ceph daemons deployed, including the mon/mgr
> deployed during bootstrap AND the daemons deployed by the cephadm mgr
> module afterwards should be deployed with the image provided to the
> "--image" parameter. You shouldn't need to set any config options or do
> anything extra to get that to work. If you're providing "--image" to
> bootstrap and this is not happening there is a serious bug (not including
> the fact that the bootstrap mgr/mon show the tag while others show the
> digest, that's purely cosmetic). If that's the case if you could post the
> full bootstrap output and the contents of the config file you're passing to
> bootstrap and maybe we can debug. I've never seen this issue before
> anywhere else so I have no way to recreate it (for me passing --image in
> bootstrap causes all ceph daemons to be deployed with that image until I
> explicitly specify another image through upgrade or other means).
>
> Also, regarding the non-uniform behaviour of the first mon even if created
>> using the same image is quite surprising. I double checked the
>> configuration of all mon, and could not find a major difference between
>> first and remaining mons. I tried to reconfigt the first mon which ended up
>> in the same corner. However, redeploying the specific mon with command
>> 'ceph orch redeploy <name> quay.io/ceph/ceph:v16.2.7, caused the first
>> mon also showing the same warning as rest, as it got redeployed by the mgr.
>
>
> Are we expecting any difference between the mon deployed by cephadm
>> bootstrap and mon deployed by mgr, even if we'r using the same image?
>> We have only the lack of warning in the first mon to state that there
>> might be a difference in the first mon and rest of the mons.
>
>
> I could maybe see some difference if you add specific config options as
> the mon deployed during bootstrap is deployed with basic settings. Since we
> can't infer config settings into the mon store until there is an existing
> monitor this is sort of necessary and could maybe cause some differences
> between that mon and others. This should be resolved by a redeploy of the
> mon. Can you tell me if you're setting any mon related config options in
> the conf you're providing to bootstrap (or if you've set any config options
> elsewhere). It may be that cephadm needs to actively redeploy the mon if
> certain options are provided in and I can look into it if I know which
> sorts of config options are causing the health warning. I haven't seen that
> health warning in my own testing (on the bootstrap mon or those deployed by
> the mgr module) so I'd need to know what's causing it to come about to come
> up with a good fix.
>
>
> - Adam King
>
> On Thu, Feb 3, 2022 at 11:29 AM Arun Vinod <arunvinod.tech@xxxxxxxxx>
> wrote:
>
>> Hi Adam,
>>
>> Thanks for reviewing the long output.
>>
>> Like you said, it makes total sense now since the first mon and mgr are
>> created by cephamd bootstrap and the rest of the dameons by the mgr module.
>>
>> But, even if I gave --image flag with bootstrap the daemons created by
>> mgr module are using the daemon-base image, in our case its '
>> docker.io/ceph/daemon-base:latest-pacific-devel'.
>> Which I guess is because, mgr daemon takes into consideration the
>> configuration parameter 'container_image', whose default value is '
>> docker.io/ceph/daemon-base:latest-pacific-devel'.
>>
>> What we guess is even if we provide --image flag in cephadm bootstrap,
>> cephadm is not updating the variable container_image with this value.
>> Hence, all the remaining daemons are getting created using
>> daemon-base image.
>>
>> Below is the value of config 'container_image' after bootstrapping with
>> --image flag provided.
>>
>> [root@hcictrl01 stack_orchestrator]# ceph-conf -D | grep -i
>> container_image
>> container_image = docker.io/ceph/daemon-base:latest-pacific-devel
>>
>> However, one workaround is to provide this config in the initial
>> bootstrap config file and present it to the cepham bootstrap using the
>> flag --config, which updates the image name and all the daemons are getting
>> created with the same image.
>>
>> Also, regarding the non-uniform behaviour of the first mon even if
>> created using the same image is quite surprising. I double checked the
>> configuration of all mon, and could not find a major difference between
>> first and remaining mons. I tried to reconfigt the first mon which ended up
>> in the same corner. However, redeploying the specific mon with command
>> 'ceph orch redeploy <name> quay.io/ceph/ceph:v16.2.7, caused the first
>> mon also showing the same warning as rest, as it got redeployed by the mgr.
>>
>> Are we expecting any difference between the mon deployed by cephadm
>> bootstrap and mon deployed by mgr, even if we'r using the same image?
>> We have only the lack of warning in the first mon to state that there
>> might be a difference in the first mon and rest of the mons.
>>
>> Thanks again Adam for checking this. Your insights into this will be
>> highly appreciated.
>>
>> Thanks and Regards,
>> Arun Vinod
>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx