Hi Arun, A couple questions. First, from where did you pull your cephadm binary from (the python file used for bootstrap). I know we swapped everything over to quay quite a bit ago ( https://github.com/ceph/ceph/commit/b291aa47825ece9fcfe9831546e1d8355b3202e4) so I want to make sure if I try to recreate this I have the same version o the binary. Secondly, I'm curious what your reason is for supplying the "--no-minimize-config" flag. Were you getting some unwanted behavior without it? I'll see if I can figure out what's going on here. Again, I've never seen this before so it might be difficult for me to recreate but I'll see what I can do. In the meantime, hopefully using the upgrade for a workaround is at least okay for you. - Adam King On Thu, Feb 3, 2022 at 2:32 PM Arun Vinod <arunvinod.tech@xxxxxxxxx> wrote: > Hi Adam, > > Thanks for the update. In that case this looks like a bug like you > mentioned. > > Here are the contents of the config file used for bootstrapping. > > [global] > > osd pool default size = 2 > > osd pool default min size = 1 > > osd pool default pg num = 8 > > osd pool default pgp num = 8 > > osd recovery delay start = 60 > > osd memory target = 1610612736 > > osd failsafe full ratio = 1.0 > > mon pg warn max object skew = 20 > > mon osd nearfull ratio = 0.8 > > mon osd backfillfull ratio = 0.87 > > mon osd full ratio = 0.95 > > mon max pg per osd = 400 > > debug asok = 0/0 > > debug auth = 0/0 > > debug buffer = 0/0 > > debug client = 0/0 > > debug context = 0/0 > > debug crush = 0/0 > debug filer = 0/0 > debug filestore = 0/0 > debug finisher = 0/0 > debug heartbeatmap = 0/0 > debug journal = 0/0 > debug journaler = 0/0 > debug lockdep = 0/0 > debug mds = 0/0 > debug mds balancer = 0/0 > debug mds locker = 0/0 > debug mds log = 0/0 > debug mds log expire = 0/0 > debug mds migrator = 0/0 > debug mon = 0/0 > debug monc = 0/0 > debug ms = 0/0 > debug objclass = 0/0 > debug objectcacher = 0/0 > debug objecter = 0/0 > debug optracker = 0/0 > debug osd = 0/0 > debug paxos = 0/0 > debug perfcounter = 0/0 > debug rados = 0/0 > debug rbd = 0/0 > debug rgw = 0/0 > debug throttle = 0/0 > debug timer = 0/0 > debug tp = 0/0 > [osd] > bluestore compression mode = passive > [mon] > mon osd allow primary affinity = true > mon allow pool delete = true > [client] > rbd cache = true > rbd cache writethrough until flush = true > rbd concurrent management ops = 20 > admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok > log file = /var/log/ceph/client.$pid.log > > Output of bootstrap command: > > [root@hcictrl01 stack_orchestrator]# sudo cephadm --image > quay.io/ceph/ceph:v16.2.7 bootstrap --skip-monitoring-stack --mon-ip > 10.175.41.11 --clus > ter-network 10.175.42.0/24 --ssh-user ceph_deploy --ssh-private-key > /home/ceph_deploy/.ssh/id_rsa --ssh-public-key > /home/ceph_deploy/.ssh/id_rsa.p > ub --config /home/ceph_deploy/ceph_bootstrap/ceph.conf > --initial-dashboard-password J959ABCFRFGE --dashboard-password-noupdate > --no-minimize-confi > g --skip-pull > > Verifying podman|docker is present... > > Verifying lvm2 is present... > > Verifying time synchronization is in place... > > Unit chronyd.service is enabled and running > > Repeating the final host check... > > podman (/bin/podman) version 3.3.1 is present > > systemctl is present > > lvcreate is present > > Unit chronyd.service is enabled and running > > Host looks OK > > Cluster fsid: dba72000-8525-11ec-b1e7-0015171590ba > > Verifying IP 10.175.41.11 port 3300 ... > > Verifying IP 10.175.41.11 port 6789 ... > > Mon IP `10.175.41.11` is in CIDR network `10.175.41.0/24` > <http://10.175.41.0/24> > > Ceph version: ceph version 16.2.7 > (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable) > > Extracting ceph user uid/gid from container image... > > Creating initial keys... > Creating initial monmap... > Creating mon... > Waiting for mon to start... > Waiting for mon... > mon is available > Setting mon public_network to 10.175.41.0/24 > Setting cluster_network to 10.175.42.0/24 > Wrote config to /etc/ceph/ceph.conf > Wrote keyring to /etc/ceph/ceph.client.admin.keyring > Creating mgr... > Verifying port 9283 ... > Waiting for mgr to start... > Waiting for mgr... > mgr not available, waiting (1/15)... > mgr not available, waiting (2/15)... > mgr not available, waiting (3/15)... > mgr not available, waiting (4/15)... > mgr is available > Enabling cephadm module... > Waiting for the mgr to restart... > Waiting for mgr epoch 5... > mgr epoch 5 is available > Setting orchestrator backend to cephadm... > Using provided ssh keys... > Adding host hcictrl01... > Deploying mon service with default placement... > Deploying mgr service with default placement... > Deploying crash service with default placement... > Enabling the dashboard module... > Waiting for the mgr to restart... > Waiting for mgr epoch 9... > mgr epoch 9 is available > Generating a dashboard self-signed certificate... > Creating initial admin user... > Fetching dashboard port number... > Ceph Dashboard is now available at: > > URL: https://hcictrl01.enclouden.com:8443/ > User: admin > Password: J959ABCFRFGE > > Enabling client.admin keyring and conf on hosts with "admin" label > You can access the Ceph CLI with: > > sudo /sbin/cephadm shell --fsid > dba72000-8525-11ec-b1e7-0015171590ba -c /etc/ceph/ceph.conf -k > /etc/ceph/ceph.client.admin.keyring > > Please consider enabling telemetry to help improve Ceph: > > ceph telemetry on > > For more information see: > > https://docs.ceph.com/docs/pacific/mgr/telemetry/ > > Bootstrap complete. > > > List of containers created after bootstrap: > > [root@hcictrl01 stack_orchestrator]# podman ps > CONTAINER ID IMAGE COMMAND > CREATED STATUS PORTS NAMES > c7bfdf3b5831 quay.io/ceph/ceph:v16.2.7 -n > mon.hcictrl01 ... 7 minutes ago Up 7 minutes ago > ceph-dba72000-8525-11ec-b1e7-0015171590ba-mon-hcictrl01 > 67c1e6f2ff1f quay.io/ceph/ceph:v16.2.7 -n > mgr.hcictrl01.... 7 minutes ago Up 7 minutes ago > ceph-dba72000-8525-11ec-b1e7-0015171590ba-mgr-hcictrl01-fvopfn > 6e87fba9235d docker.io/ceph/daemon-base:latest-pacific-devel -n > client.crash.h... About a minute ago Up About a minute ago > ceph-dba72000-8525-11ec-b1e7-0015171590ba-crash-hcictrl01 > > [root@hcictrl01 stack_orchestrator]# ceph orch ps > NAME HOST PORTS STATUS REFRESHED AGE > MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID > crash.hcictrl01 hcictrl01 running (87s) 83s ago 87s > 6975k - 16.2.5-387-g7282d81d 41387741ad94 6e87fba9235d > mgr.hcictrl01.fvopfn hcictrl01 *:9283 running (7m) 83s ago 7m > 399M - 16.2.7 231fd40524c4 67c1e6f2ff1f > mon.hcictrl01 hcictrl01 running (8m) 83s ago 8m > 45.4M 2048M 16.2.7 231fd40524c4 c7bfdf3b5831 > > [root@hcictrl01 stack_orchestrator]# podman images > REPOSITORY TAG IMAGE ID CREATED > SIZE > quay.io/ceph/ceph v16.2.7 231fd40524c4 2 days > ago 1.39 GB > docker.io/ceph/daemon-base latest-pacific-devel 41387741ad94 5 months > ago 1.23 GB > > As you can see the crash daemon is getting created on the image ' > docker.io/ceph/daemon-base:latest-pacific-devel' and it's not respecting > the --image flag provided. Also, we are not setting any config elsewhere > other than the bootstrap conf file. > > > I have also attached the full log of cephadm, hope you can view it from > email. Let me know if you need any further data. > > Thanks in advance > > Regards, > Arun Vinod > > On Fri, 4 Feb 2022 at 00:17, Adam King <adking@xxxxxxxxxx> wrote: > >> But, even if I gave --image flag with bootstrap the daemons created by >>> mgr module are using the daemon-base image, in our case its ' >>> docker.io/ceph/daemon-base:latest-pacific-devel'. >>> Which I guess is because, mgr daemon takes into consideration the >>> configuration parameter 'container_image', whose default value is ' >>> docker.io/ceph/daemon-base:latest-pacific-devel'. >>> What we guess is even if we provide --image flag in cephadm bootstrap, >>> cephadm is not updating the variable container_image with this value. >>> Hence, all the remaining daemons are getting created using >>> daemon-base image. >> >> >> This is not how it's supposed to work. If you provide "--image >> <image-name>" to bootstrap all ceph daemons deployed, including the mon/mgr >> deployed during bootstrap AND the daemons deployed by the cephadm mgr >> module afterwards should be deployed with the image provided to the >> "--image" parameter. You shouldn't need to set any config options or do >> anything extra to get that to work. If you're providing "--image" to >> bootstrap and this is not happening there is a serious bug (not including >> the fact that the bootstrap mgr/mon show the tag while others show the >> digest, that's purely cosmetic). If that's the case if you could post the >> full bootstrap output and the contents of the config file you're passing to >> bootstrap and maybe we can debug. I've never seen this issue before >> anywhere else so I have no way to recreate it (for me passing --image in >> bootstrap causes all ceph daemons to be deployed with that image until I >> explicitly specify another image through upgrade or other means). >> >> Also, regarding the non-uniform behaviour of the first mon even if >>> created using the same image is quite surprising. I double checked the >>> configuration of all mon, and could not find a major difference between >>> first and remaining mons. I tried to reconfigt the first mon which ended up >>> in the same corner. However, redeploying the specific mon with command >>> 'ceph orch redeploy <name> quay.io/ceph/ceph:v16.2.7, caused the first >>> mon also showing the same warning as rest, as it got redeployed by the mgr. >> >> >> Are we expecting any difference between the mon deployed by cephadm >>> bootstrap and mon deployed by mgr, even if we'r using the same image? >>> We have only the lack of warning in the first mon to state that there >>> might be a difference in the first mon and rest of the mons. >> >> >> I could maybe see some difference if you add specific config options as >> the mon deployed during bootstrap is deployed with basic settings. Since we >> can't infer config settings into the mon store until there is an existing >> monitor this is sort of necessary and could maybe cause some differences >> between that mon and others. This should be resolved by a redeploy of the >> mon. Can you tell me if you're setting any mon related config options in >> the conf you're providing to bootstrap (or if you've set any config options >> elsewhere). It may be that cephadm needs to actively redeploy the mon if >> certain options are provided in and I can look into it if I know which >> sorts of config options are causing the health warning. I haven't seen that >> health warning in my own testing (on the bootstrap mon or those deployed by >> the mgr module) so I'd need to know what's causing it to come about to come >> up with a good fix. >> >> >> - Adam King >> >> On Thu, Feb 3, 2022 at 11:29 AM Arun Vinod <arunvinod.tech@xxxxxxxxx> >> wrote: >> >>> Hi Adam, >>> >>> Thanks for reviewing the long output. >>> >>> Like you said, it makes total sense now since the first mon and mgr are >>> created by cephamd bootstrap and the rest of the dameons by the mgr module. >>> >>> But, even if I gave --image flag with bootstrap the daemons created by >>> mgr module are using the daemon-base image, in our case its ' >>> docker.io/ceph/daemon-base:latest-pacific-devel'. >>> Which I guess is because, mgr daemon takes into consideration the >>> configuration parameter 'container_image', whose default value is ' >>> docker.io/ceph/daemon-base:latest-pacific-devel'. >>> >>> What we guess is even if we provide --image flag in cephadm bootstrap, >>> cephadm is not updating the variable container_image with this value. >>> Hence, all the remaining daemons are getting created using >>> daemon-base image. >>> >>> Below is the value of config 'container_image' after bootstrapping with >>> --image flag provided. >>> >>> [root@hcictrl01 stack_orchestrator]# ceph-conf -D | grep -i >>> container_image >>> container_image = docker.io/ceph/daemon-base:latest-pacific-devel >>> >>> However, one workaround is to provide this config in the initial >>> bootstrap config file and present it to the cepham bootstrap using the >>> flag --config, which updates the image name and all the daemons are getting >>> created with the same image. >>> >>> Also, regarding the non-uniform behaviour of the first mon even if >>> created using the same image is quite surprising. I double checked the >>> configuration of all mon, and could not find a major difference between >>> first and remaining mons. I tried to reconfigt the first mon which ended up >>> in the same corner. However, redeploying the specific mon with command >>> 'ceph orch redeploy <name> quay.io/ceph/ceph:v16.2.7, caused the first >>> mon also showing the same warning as rest, as it got redeployed by the mgr. >>> >>> Are we expecting any difference between the mon deployed by cephadm >>> bootstrap and mon deployed by mgr, even if we'r using the same image? >>> We have only the lack of warning in the first mon to state that there >>> might be a difference in the first mon and rest of the mons. >>> >>> Thanks again Adam for checking this. Your insights into this will be >>> highly appreciated. >>> >>> Thanks and Regards, >>> Arun Vinod >>> >>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx