Re: cephadm picks development/latest tagged image for daemon-base (docker.io/ceph/daemon-base:latest-pacific-devel)

Adam King <adking@xxxxxxxxxx> · Thu, 3 Feb 2022 20:24:18 -0500

Hi Arun,

A couple questions. First, from where did you pull your cephadm binary from
(the python file used for bootstrap). I know we swapped everything over to
quay quite a bit ago (
https://github.com/ceph/ceph/commit/b291aa47825ece9fcfe9831546e1d8355b3202e4)
so I want to make sure if I try to recreate this I have the same version o
the binary. Secondly, I'm curious what your reason is for supplying the
"--no-minimize-config" flag. Were you getting some unwanted behavior
without it?

I'll see if I can figure out what's going on here. Again, I've never seen
this before so it might be difficult for me to recreate but I'll see what I
can do. In the meantime, hopefully using the upgrade for a workaround is at
least okay for you.

- Adam King

On Thu, Feb 3, 2022 at 2:32 PM Arun Vinod <arunvinod.tech@xxxxxxxxx> wrote:

> Hi Adam,
>
> Thanks for the update. In that case this looks like a bug like you
> mentioned.
>
> Here are the contents of the config file used for bootstrapping.
>
> [global]
>
> osd pool default size = 2
>
> osd pool default min size = 1
>
> osd pool default pg num = 8
>
> osd pool default pgp num = 8
>
> osd recovery delay start = 60
>
> osd memory target = 1610612736
>
> osd failsafe full ratio = 1.0
>
> mon pg warn max object skew = 20
>
> mon osd nearfull ratio = 0.8
>
> mon osd backfillfull ratio = 0.87
>
> mon osd full ratio = 0.95
>
> mon max pg per osd = 400
>
> debug asok = 0/0
>
> debug auth = 0/0
>
> debug buffer = 0/0
>
> debug client = 0/0
>
> debug context = 0/0
>
> debug crush = 0/0
> debug filer = 0/0
> debug filestore = 0/0
> debug finisher = 0/0
> debug heartbeatmap = 0/0
> debug journal = 0/0
> debug journaler = 0/0
> debug lockdep = 0/0
> debug mds = 0/0
> debug mds balancer = 0/0
> debug mds locker = 0/0
> debug mds log = 0/0
> debug mds log expire = 0/0
> debug mds migrator = 0/0
> debug mon = 0/0
> debug monc = 0/0
> debug ms = 0/0
> debug objclass = 0/0
> debug objectcacher = 0/0
> debug objecter = 0/0
> debug optracker = 0/0
> debug osd = 0/0
> debug paxos = 0/0
> debug perfcounter = 0/0
> debug rados = 0/0
> debug rbd = 0/0
> debug rgw = 0/0
> debug throttle = 0/0
> debug timer = 0/0
> debug tp = 0/0
> [osd]
> bluestore compression mode = passive
> [mon]
> mon osd allow primary affinity = true
> mon allow pool delete = true
> [client]
> rbd cache = true
> rbd cache writethrough until flush = true
> rbd concurrent management ops = 20
> admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
> log file = /var/log/ceph/client.$pid.log
>
> Output of bootstrap command:
>
> [root@hcictrl01 stack_orchestrator]# sudo cephadm --image
> quay.io/ceph/ceph:v16.2.7 bootstrap --skip-monitoring-stack --mon-ip
> 10.175.41.11 --clus
> ter-network 10.175.42.0/24 --ssh-user ceph_deploy --ssh-private-key
> /home/ceph_deploy/.ssh/id_rsa --ssh-public-key
> /home/ceph_deploy/.ssh/id_rsa.p
> ub --config /home/ceph_deploy/ceph_bootstrap/ceph.conf
> --initial-dashboard-password J959ABCFRFGE --dashboard-password-noupdate
> --no-minimize-confi
> g --skip-pull
>
> Verifying podman|docker is present...
>
> Verifying lvm2 is present...
>
> Verifying time synchronization is in place...
>
> Unit chronyd.service is enabled and running
>
> Repeating the final host check...
>
> podman (/bin/podman) version 3.3.1 is present
>
> systemctl is present
>
> lvcreate is present
>
> Unit chronyd.service is enabled and running
>
> Host looks OK
>
> Cluster fsid: dba72000-8525-11ec-b1e7-0015171590ba
>
> Verifying IP 10.175.41.11 port 3300 ...
>
> Verifying IP 10.175.41.11 port 6789 ...
>
> Mon IP `10.175.41.11` is in CIDR network `10.175.41.0/24`
> <http://10.175.41.0/24>
>
> Ceph version: ceph version 16.2.7
> (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
>
> Extracting ceph user uid/gid from container image...
>
> Creating initial keys...
> Creating initial monmap...
> Creating mon...
> Waiting for mon to start...
> Waiting for mon...
> mon is available
> Setting mon public_network to 10.175.41.0/24
> Setting cluster_network to 10.175.42.0/24
> Wrote config to /etc/ceph/ceph.conf
> Wrote keyring to /etc/ceph/ceph.client.admin.keyring
> Creating mgr...
> Verifying port 9283 ...
> Waiting for mgr to start...
> Waiting for mgr...
> mgr not available, waiting (1/15)...
> mgr not available, waiting (2/15)...
> mgr not available, waiting (3/15)...
> mgr not available, waiting (4/15)...
> mgr is available
> Enabling cephadm module...
> Waiting for the mgr to restart...
> Waiting for mgr epoch 5...
> mgr epoch 5 is available
> Setting orchestrator backend to cephadm...
> Using provided ssh keys...
> Adding host hcictrl01...
> Deploying mon service with default placement...
> Deploying mgr service with default placement...
> Deploying crash service with default placement...
> Enabling the dashboard module...
> Waiting for the mgr to restart...
> Waiting for mgr epoch 9...
> mgr epoch 9 is available
> Generating a dashboard self-signed certificate...
> Creating initial admin user...
> Fetching dashboard port number...
> Ceph Dashboard is now available at:
>
>              URL: https://hcictrl01.enclouden.com:8443/
>             User: admin
>         Password: J959ABCFRFGE
>
> Enabling client.admin keyring and conf on hosts with "admin" label
> You can access the Ceph CLI with:
>
>         sudo /sbin/cephadm shell --fsid
> dba72000-8525-11ec-b1e7-0015171590ba -c /etc/ceph/ceph.conf -k
> /etc/ceph/ceph.client.admin.keyring
>
> Please consider enabling telemetry to help improve Ceph:
>
>         ceph telemetry on
>
> For more information see:
>
>         https://docs.ceph.com/docs/pacific/mgr/telemetry/
>
> Bootstrap complete.
>
>
> List of containers created after bootstrap:
>
> [root@hcictrl01 stack_orchestrator]# podman ps
> CONTAINER ID  IMAGE                                            COMMAND
>           CREATED             STATUS                 PORTS       NAMES
> c7bfdf3b5831  quay.io/ceph/ceph:v16.2.7                        -n
> mon.hcictrl01 ...  7 minutes ago       Up 7 minutes ago
> ceph-dba72000-8525-11ec-b1e7-0015171590ba-mon-hcictrl01
> 67c1e6f2ff1f  quay.io/ceph/ceph:v16.2.7                        -n
> mgr.hcictrl01....  7 minutes ago       Up 7 minutes ago
> ceph-dba72000-8525-11ec-b1e7-0015171590ba-mgr-hcictrl01-fvopfn
> 6e87fba9235d  docker.io/ceph/daemon-base:latest-pacific-devel  -n
> client.crash.h...  About a minute ago  Up About a minute ago
>  ceph-dba72000-8525-11ec-b1e7-0015171590ba-crash-hcictrl01
>
> [root@hcictrl01 stack_orchestrator]# ceph orch ps
> NAME                  HOST       PORTS   STATUS         REFRESHED  AGE
>  MEM USE  MEM LIM  VERSION               IMAGE ID      CONTAINER ID
> crash.hcictrl01       hcictrl01          running (87s)    83s ago  87s
>  6975k        -  16.2.5-387-g7282d81d  41387741ad94  6e87fba9235d
> mgr.hcictrl01.fvopfn  hcictrl01  *:9283  running (7m)     83s ago   7m
> 399M        -  16.2.7                231fd40524c4  67c1e6f2ff1f
> mon.hcictrl01         hcictrl01          running (8m)     83s ago   8m
>  45.4M    2048M  16.2.7                231fd40524c4  c7bfdf3b5831
>
> [root@hcictrl01 stack_orchestrator]# podman images
> REPOSITORY                  TAG                   IMAGE ID      CREATED
>     SIZE
> quay.io/ceph/ceph           v16.2.7               231fd40524c4  2 days
> ago    1.39 GB
> docker.io/ceph/daemon-base  latest-pacific-devel  41387741ad94  5 months
> ago  1.23 GB
>
> As you can see the crash daemon is getting created on the image '
> docker.io/ceph/daemon-base:latest-pacific-devel'  and it's not respecting
> the --image flag provided. Also, we are not setting any  config elsewhere
> other than the bootstrap conf file.
>
>
> I have also attached the full log of cephadm, hope you can view it from
> email. Let me know if you need any further data.
>
> Thanks in advance
>
> Regards,
> Arun Vinod
>
> On Fri, 4 Feb 2022 at 00:17, Adam King <adking@xxxxxxxxxx> wrote:
>
>> But, even if I gave --image flag with bootstrap the daemons created by
>>> mgr module are using the daemon-base image, in our case its '
>>> docker.io/ceph/daemon-base:latest-pacific-devel'.
>>> Which I guess is because, mgr daemon takes into consideration the
>>> configuration parameter 'container_image', whose default value is '
>>> docker.io/ceph/daemon-base:latest-pacific-devel'.
>>> What we guess is even if we provide --image flag in cephadm bootstrap,
>>> cephadm is not updating the variable container_image with this value.
>>> Hence, all the remaining daemons are getting created using
>>> daemon-base image.
>>
>>
>> This is not how it's supposed to work. If you provide "--image
>> <image-name>" to bootstrap all ceph daemons deployed, including the mon/mgr
>> deployed during bootstrap AND the daemons deployed by the cephadm mgr
>> module afterwards should be deployed with the image provided to the
>> "--image" parameter. You shouldn't need to set any config options or do
>> anything extra to get that to work. If you're providing "--image" to
>> bootstrap and this is not happening there is a serious bug (not including
>> the fact that the bootstrap mgr/mon show the tag while others show the
>> digest, that's purely cosmetic). If that's the case if you could post the
>> full bootstrap output and the contents of the config file you're passing to
>> bootstrap and maybe we can debug. I've never seen this issue before
>> anywhere else so I have no way to recreate it (for me passing --image in
>> bootstrap causes all ceph daemons to be deployed with that image until I
>> explicitly specify another image through upgrade or other means).
>>
>> Also, regarding the non-uniform behaviour of the first mon even if
>>> created using the same image is quite surprising. I double checked the
>>> configuration of all mon, and could not find a major difference between
>>> first and remaining mons. I tried to reconfigt the first mon which ended up
>>> in the same corner. However, redeploying the specific mon with command
>>> 'ceph orch redeploy <name> quay.io/ceph/ceph:v16.2.7, caused the first
>>> mon also showing the same warning as rest, as it got redeployed by the mgr.
>>
>>
>> Are we expecting any difference between the mon deployed by cephadm
>>> bootstrap and mon deployed by mgr, even if we'r using the same image?
>>> We have only the lack of warning in the first mon to state that there
>>> might be a difference in the first mon and rest of the mons.
>>
>>
>> I could maybe see some difference if you add specific config options as
>> the mon deployed during bootstrap is deployed with basic settings. Since we
>> can't infer config settings into the mon store until there is an existing
>> monitor this is sort of necessary and could maybe cause some differences
>> between that mon and others. This should be resolved by a redeploy of the
>> mon. Can you tell me if you're setting any mon related config options in
>> the conf you're providing to bootstrap (or if you've set any config options
>> elsewhere). It may be that cephadm needs to actively redeploy the mon if
>> certain options are provided in and I can look into it if I know which
>> sorts of config options are causing the health warning. I haven't seen that
>> health warning in my own testing (on the bootstrap mon or those deployed by
>> the mgr module) so I'd need to know what's causing it to come about to come
>> up with a good fix.
>>
>>
>> - Adam King
>>
>> On Thu, Feb 3, 2022 at 11:29 AM Arun Vinod <arunvinod.tech@xxxxxxxxx>
>> wrote:
>>
>>> Hi Adam,
>>>
>>> Thanks for reviewing the long output.
>>>
>>> Like you said, it makes total sense now since the first mon and mgr are
>>> created by cephamd bootstrap and the rest of the dameons by the mgr module.
>>>
>>> But, even if I gave --image flag with bootstrap the daemons created by
>>> mgr module are using the daemon-base image, in our case its '
>>> docker.io/ceph/daemon-base:latest-pacific-devel'.
>>> Which I guess is because, mgr daemon takes into consideration the
>>> configuration parameter 'container_image', whose default value is '
>>> docker.io/ceph/daemon-base:latest-pacific-devel'.
>>>
>>> What we guess is even if we provide --image flag in cephadm bootstrap,
>>> cephadm is not updating the variable container_image with this value.
>>> Hence, all the remaining daemons are getting created using
>>> daemon-base image.
>>>
>>> Below is the value of config 'container_image' after bootstrapping with
>>> --image flag provided.
>>>
>>> [root@hcictrl01 stack_orchestrator]# ceph-conf -D | grep -i
>>> container_image
>>> container_image = docker.io/ceph/daemon-base:latest-pacific-devel
>>>
>>> However, one workaround is to provide this config in the initial
>>> bootstrap config file and present it to the cepham bootstrap using the
>>> flag --config, which updates the image name and all the daemons are getting
>>> created with the same image.
>>>
>>> Also, regarding the non-uniform behaviour of the first mon even if
>>> created using the same image is quite surprising. I double checked the
>>> configuration of all mon, and could not find a major difference between
>>> first and remaining mons. I tried to reconfigt the first mon which ended up
>>> in the same corner. However, redeploying the specific mon with command
>>> 'ceph orch redeploy <name> quay.io/ceph/ceph:v16.2.7, caused the first
>>> mon also showing the same warning as rest, as it got redeployed by the mgr.
>>>
>>> Are we expecting any difference between the mon deployed by cephadm
>>> bootstrap and mon deployed by mgr, even if we'r using the same image?
>>> We have only the lack of warning in the first mon to state that there
>>> might be a difference in the first mon and rest of the mons.
>>>
>>> Thanks again Adam for checking this. Your insights into this will be
>>> highly appreciated.
>>>
>>> Thanks and Regards,
>>> Arun Vinod
>>>
>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx