Hi Eugen, Thank you so much for the details. Here is the update (comments in-line >>): Regards, Anantha -----Original Message----- From: Eugen Block <eblock@xxxxxx> Sent: Monday, June 19, 2023 5:27 AM To: ceph-users@xxxxxxx Subject: Re: Grafana service fails to start due to bad directory name after Quincy upgrade Hi, so grafana is starting successfully now? What did you change? >> I stopped and removed the Grafana image and started it from "Ceph Dashboard" service. The version is still 6.7.4. I also had to change the following. I do not have a way to make this permanent, if the service is redeployed I will lose the changes. I did not save the file that cephadm generated. This was one reason why Grafana service would not start. I had replace it with the one below to resolve this issue. [users] default_theme = light [auth.anonymous] enabled = true org_name = 'Main Org.' org_role = 'Viewer' [server] domain = 'bootstrap.storage.lab' protocol = https cert_file = /etc/grafana/certs/cert_file cert_key = /etc/grafana/certs/cert_key http_port = 3000 http_addr = [snapshots] external_enabled = false [security] disable_initial_admin_creation = false cookie_secure = true cookie_samesite = none allow_embedding = true admin_password = paswd-value admin_user = user-name Also this was the other change: # This file is generated by cephadm. apiVersion: 1 <-- This was the line added to var/lib/ceph/d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e/grafana.fl31ca104ja0201/etc/grafana/provisioning/datasources/ceph-dashboard.yml >> Regarding the container images, yes there are defaults in cephadm which can be overridden with ceph config. Can you share this output? ceph config dump | grep container_image >> Here it is root@fl31ca104ja0201:/# ceph config dump | grep container_image global basic container_image quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e * mgr advanced mgr/cephadm/container_image_alertmanager docker.io/prom/alertmanager:v0.16.2 * mgr advanced mgr/cephadm/container_image_base quay.io/ceph/daemon mgr advanced mgr/cephadm/container_image_grafana docker.io/grafana/grafana:6.7.4 * mgr advanced mgr/cephadm/container_image_node_exporter docker.io/prom/node-exporter:v0.17.0 * mgr advanced mgr/cephadm/container_image_prometheus docker.io/prom/prometheus:v2.7.2 * client.rgw.default.default.fl31ca104ja0201.ninovs basic container_image quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e * client.rgw.default.default.fl31ca104ja0202.yhjkmb basic container_image quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e * client.rgw.default.default.fl31ca104ja0203.fqnriq basic container_image quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e * >> I tend to always use a specific image as described here [2]. I also haven't deployed grafana via dashboard yet so I can't really comment on that as well as on the warnings you report. >>OK. The need for that is, in Quincy when you enable Loki and Promtail, to view the daemon logs Ceph board pulls in Grafana dashboard. I will let you know once that issue is resolved. Regards, Eugen [2] https://docs.ceph.com/en/latest/cephadm/services/monitoring/#using-custom-images >> Thank you I am following the document now Zitat von "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>: > Hi Eugene, > > Thank you for your response, here is the update. > > The upgrade to Quincy was done following the cephadm orch upgrade > procedure ceph orch upgrade start --image quay.io/ceph/ceph:v17.2.6 > > Upgrade completed with out errors. After the upgrade, upon creating > the Grafana service from Ceph dashboard, it deployed Grafana 6.7.4. > The version is hardcoded in the code, should it not be 8.3.5 as listed > below in Quincy documentation? See below > > [Grafana service started from Cephdashboard] > > Quincy documentation states: > https://docs.ceph.com/en/latest/releases/quincy/ > ……documentation snippet > Monitoring and alerting: > 43 new alerts have been added (totalling 68) improving observability > of events affecting: cluster health, monitors, storage devices, PGs > and CephFS. > Alerts can now be sent externally as SNMP traps via the new SNMP > gateway service (the MIB is provided). > Improved integrated full/nearfull event notifications. > Grafana Dashboards now use grafonnet format (though they’re still > available in JSON format). > Stack update: images for monitoring containers have been updated. > Grafana 8.3.5, Prometheus 2.33.4, Alertmanager 0.23.0 and Node > Exporter 1.3.1. This reduced exposure to several Grafana > vulnerabilities (CVE-2021-43798, CVE-2021-39226, CVE-2021-43798, > CVE-2020-29510, CVE-2020-29511). > …………………. > > I notice that the versions of the remaining stack, that Ceph > dashboard deploys, are also older than what is documented. > Prometheus 2.7.2, Alertmanager 0.16.2 and Node Exporter 0.17.0. > > AND 6.7.4 Grafana service reports a few warnings: highlighted below > > root@fl31ca104ja0201:/home/general# systemctl status > ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@xxxxxxxxxxxxxxxxxxxxxxxxxxxx > ice > ● > ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@xxxxxxxxxxxxxxxxxxxxxxxxxxxx > ice - Ceph grafana.fl31ca104ja0201 for d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > Loaded: loaded > (/etc/systemd/system/ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@.servic > e; > enabled; vendor preset: enabled) > Active: active (running) since Tue 2023-06-13 03:37:58 UTC; 11h ago > Main PID: 391896 (bash) > Tasks: 53 (limit: 618607) > Memory: 17.9M > CGroup: > /system.slice/system-ceph\x2dd0a3b6e0\x2dd2c3\x2d11ed\x2dbe05\x2da7a3a1d7a87e.slice/ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@grafana.fl31ca104j> > ├─391896 /bin/bash > /var/lib/ceph/d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e/grafana.fl31ca104ja0201/unit.run > └─391969 /usr/bin/docker run --rm --ipc=host > --stop-signal=SIGTERM --net=host --init --name > ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-grafana-fl> > -- Logs begin at Sun 2023-06-11 20:41:51 UTC, end at Tue 2023-06-13 > 15:35:12 UTC. -- > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="alter user_auth.auth_id to length 190" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="Add OAuth access token to user_auth" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="Add OAuth refresh token to user_auth" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="Add OAuth token type to user_auth" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="Add OAuth expiry to user_auth" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="Add index to user_id column in user_auth" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="create server_lock table" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="add index server_lock.operation_uid" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="create user auth token table" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="add unique index user_auth_token.auth_token" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="add unique index user_auth_token.prev_auth_token" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="create cache_data table" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration" > logger=migrator id="add unique index cache_data.cache_key" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Created default organization" > logger=sqlstore Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing HTTPServer" > logger=server > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing > BackendPluginManager" logger=server Jun 13 03:37:59 fl31ca104ja0201 > bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing PluginManager" > logger=server > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Starting plugin search" > logger=plugins > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing HooksService" > logger=server > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing > OSSLicensingService" logger=server Jun 13 03:37:59 fl31ca104ja0201 > bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing > InternalMetricsService" logger=server Jun 13 03:37:59 fl31ca104ja0201 > bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing RemoteCache" > logger=server > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing > RenderingService" logger=server Jun 13 03:37:59 fl31ca104ja0201 > bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing AlertEngine" > logger=server > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing QuotaService" > logger=server > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing > ServerLockService" logger=server Jun 13 03:37:59 fl31ca104ja0201 > bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing > UserAuthTokenService" logger=server Jun 13 03:37:59 fl31ca104ja0201 > bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing > DatasourceCacheService" logger=server Jun 13 03:37:59 fl31ca104ja0201 > bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing LoginService" > logger=server > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing SearchService" > logger=server > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing TracingService" > logger=server Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing > UsageStatsService" logger=server Jun 13 03:37:59 fl31ca104ja0201 > bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing CleanUpService" > logger=server Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing > NotificationService" logger=server Jun 13 03:37:59 fl31ca104ja0201 > bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing > provisioningServiceImpl" logger=server Jun 13 03:37:59 fl31ca104ja0201 > bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=warn msg="[Deprecated] the datasource > provisioning config is outdated. please upgrade" > logger=provisioning.datasources > filename=/etc/grafana/provisioning/datasources/ceph-dashboard.yml > > This warning comes due to the missing “ apiVersion: 1” first line > entry in /etc/grafana/provisioning/datasources/ceph-dashboard.yml > created by cephadm. > If the file is modified to include the apiversion line and restart > Grafana service, > > Is this a known ISSUE ? > > Here is the content of the ceph-dashboard.yml produced by cephadm > deleteDatasources: > - name: 'Dashboard1' > orgId: 1 > > - name: 'Loki' > orgId: 2 > > datasources: > - name: 'Dashboard1' > type: 'prometheus' > access: 'proxy' > orgId: 1 > url: 'http://fl31ca104ja0201.xxx.xxx.com:9095' > basicAuth: false > isDefault: true > editable: false > > - name: 'Loki' > type: 'loki' > access: 'proxy' > orgId: 2 > url: '' > basicAuth: false > isDefault: true > editable: false > -------------------------------------------------------------- > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="inserting datasource from > configuration " logger=provisioning.datasources name=Dashboard1 Jun 13 > 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="inserting datasource from > configuration " logger=provisioning.datasources name=Loki Jun 13 > 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Backend rendering via > phantomJS" logger=rendering renderer=phantomJS Jun 13 03:37:59 > fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=warn msg="phantomJS is deprecated and > will be removed in a future release. You should consider migrating > from phantomJS to grafana-image-renderer plugin. Read more at > https://grafana.com/docs/grafana/latest/administration/image_rendering/" > logger=rendering renderer=phantomJS > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing Stream Manager" > Jun 13 03:37:59 fl31ca104ja0201 bash[391969]: > t=2023-06-13T03:37:59+0000 lvl=info msg="HTTP Server Listen" > logger=http.server address=[::]:3000 protocol=https subUrl= socket= > > > I also had to change a few other things to keep all the services > running. The last issue that I have not been able to resolve yet is > the Cephbash board gives this error even though grafana is running on > the same server. However, the grafana dashboard cannot be accessed > without tunnelling. > > [cid:image002.png@01D9A10B.F8B9D220] _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx