Re: Grafana service fails to start due to bad directory name after Quincy upgrade

"Adiga, Anantha" <anantha.adiga@xxxxxxxxx> · Thu, 22 Jun 2023 18:34:14 +0000

Hi Eugen,

Thank you so much for the details.  Here is the update (comments in-line >>):

Regards,
Anantha
-----Original Message-----
From: Eugen Block <eblock@xxxxxx> 
Sent: Monday, June 19, 2023 5:27 AM
To: ceph-users@xxxxxxx
Subject:  Re: Grafana service fails to start due to bad directory name after Quincy upgrade

Hi,

so grafana is starting successfully now? What did you change?  
>>  I stopped and removed the Grafana image and  started it from "Ceph Dashboard" service. The version is still 6.7.4. I also had to change the following. 
I do not have a way to make  this permanent, if the service is redeployed I  will lose  the changes. 
I did not save the file that cephadm generated. This was one reason why  Grafana service would not start. I had replace it with the one below to resolve this issue. 
[users]
  default_theme = light
[auth.anonymous]
  enabled = true
  org_name = 'Main Org.'
  org_role = 'Viewer'
[server]
  domain = 'bootstrap.storage.lab'
  protocol = https
  cert_file = /etc/grafana/certs/cert_file
  cert_key = /etc/grafana/certs/cert_key
  http_port = 3000
  http_addr =
[snapshots]
  external_enabled = false
[security]
  disable_initial_admin_creation = false
  cookie_secure = true
  cookie_samesite = none
  allow_embedding = true
  admin_password = paswd-value
  admin_user = user-name

Also this was the other change:	
# This file is generated by cephadm.
apiVersion: 1   <--  This was the line added to var/lib/ceph/d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e/grafana.fl31ca104ja0201/etc/grafana/provisioning/datasources/ceph-dashboard.yml
>>
Regarding the container images, yes there are defaults in cephadm which can be overridden with ceph config. Can you share this output?

ceph config dump | grep container_image
>>
Here it is
root@fl31ca104ja0201:/# ceph config dump | grep container_image
global                                                   basic     container_image                            quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e  *
mgr                                                      advanced  mgr/cephadm/container_image_alertmanager   docker.io/prom/alertmanager:v0.16.2                                                        *
mgr                                                      advanced  mgr/cephadm/container_image_base           quay.io/ceph/daemon
mgr                                                      advanced  mgr/cephadm/container_image_grafana        docker.io/grafana/grafana:6.7.4                                                            *
mgr                                                      advanced  mgr/cephadm/container_image_node_exporter  docker.io/prom/node-exporter:v0.17.0                                                       *
mgr                                                      advanced  mgr/cephadm/container_image_prometheus     docker.io/prom/prometheus:v2.7.2                                                           *
client.rgw.default.default.fl31ca104ja0201.ninovs        basic     container_image                            quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e  *
client.rgw.default.default.fl31ca104ja0202.yhjkmb        basic     container_image                            quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e  *
client.rgw.default.default.fl31ca104ja0203.fqnriq        basic     container_image                            quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e  *
>>
I tend to always use a specific image as described here [2]. I also haven't deployed grafana via dashboard yet so I can't really comment on that as well as on the warnings you report.

>>OK. The need for that is, in Quincy when you enable Loki and Promtail, to view the daemon logs Ceph board pulls in Grafana  dashboard. I will let you know once that issue is resolved.

Regards,
Eugen

[2]
https://docs.ceph.com/en/latest/cephadm/services/monitoring/#using-custom-images
>> Thank you I am following the document now

Zitat von "Adiga, Anantha" <anantha.adiga@xxxxxxxxx>:

> Hi Eugene,
>
> Thank you for your response, here is the update.
>
> The upgrade to Quincy was done  following the cephadm orch upgrade 
> procedure ceph orch upgrade start --image quay.io/ceph/ceph:v17.2.6
>
> Upgrade completed with out errors. After the upgrade, upon creating 
> the Grafana service from Ceph dashboard, it deployed Grafana 6.7.4.
> The version is hardcoded in the code, should it not be 8.3.5 as listed 
> below in Quincy documentation? See below
>
> [Grafana service started from Cephdashboard]
>
> Quincy documentation states: 
> https://docs.ceph.com/en/latest/releases/quincy/
> ……documentation snippet
> Monitoring and alerting:
> 43 new alerts have been added (totalling 68) improving observability 
> of events affecting: cluster health, monitors, storage devices, PGs 
> and CephFS.
> Alerts can now be sent externally as SNMP traps via the new SNMP 
> gateway service (the MIB is provided).
> Improved integrated full/nearfull event notifications.
> Grafana Dashboards now use grafonnet format (though they’re still 
> available in JSON format).
> Stack update: images for monitoring containers have been updated.  
> Grafana 8.3.5, Prometheus 2.33.4, Alertmanager 0.23.0 and Node 
> Exporter 1.3.1. This reduced exposure to several Grafana 
> vulnerabilities (CVE-2021-43798, CVE-2021-39226, CVE-2021-43798, 
> CVE-2020-29510, CVE-2020-29511).
> ………………….
>
> I notice that the versions of the remaining stack, that Ceph  
> dashboard deploys,  are also older than what is documented.   
> Prometheus 2.7.2, Alertmanager 0.16.2 and Node Exporter 0.17.0.
>
> AND 6.7.4 Grafana service reports a few warnings: highlighted below
>
> root@fl31ca104ja0201:/home/general# systemctl status 
> ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
> ice
> ●
> ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
> ice - Ceph grafana.fl31ca104ja0201 for d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e
>      Loaded: loaded
> (/etc/systemd/system/ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@.servic
> e;
> enabled; vendor preset: enabled)
>      Active: active (running) since Tue 2023-06-13 03:37:58 UTC; 11h ago
>    Main PID: 391896 (bash)
>       Tasks: 53 (limit: 618607)
>      Memory: 17.9M
>      CGroup:  
> /system.slice/system-ceph\x2dd0a3b6e0\x2dd2c3\x2d11ed\x2dbe05\x2da7a3a1d7a87e.slice/ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e@grafana.fl31ca104j>
>              ├─391896 /bin/bash
> /var/lib/ceph/d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e/grafana.fl31ca104ja0201/unit.run
>              └─391969 /usr/bin/docker run --rm --ipc=host 
> --stop-signal=SIGTERM --net=host --init --name
> ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-grafana-fl>
> -- Logs begin at Sun 2023-06-11 20:41:51 UTC, end at Tue 2023-06-13
> 15:35:12 UTC. --
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="alter user_auth.auth_id to length 190"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="Add OAuth access token to user_auth"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="Add OAuth refresh token to user_auth"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="Add OAuth token type to user_auth"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="Add OAuth expiry to user_auth"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="Add index to user_id column in user_auth"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="create server_lock table"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="add index server_lock.operation_uid"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="create user auth token table"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="add unique index user_auth_token.auth_token"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="add unique index user_auth_token.prev_auth_token"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="create cache_data table"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Executing migration"  
> logger=migrator id="add unique index cache_data.cache_key"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Created default organization" 
> logger=sqlstore Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing HTTPServer"  
> logger=server
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing 
> BackendPluginManager" logger=server Jun 13 03:37:59 fl31ca104ja0201 
> bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing PluginManager"  
> logger=server
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Starting plugin search"  
> logger=plugins
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing HooksService"  
> logger=server
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing 
> OSSLicensingService" logger=server Jun 13 03:37:59 fl31ca104ja0201 
> bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing 
> InternalMetricsService" logger=server Jun 13 03:37:59 fl31ca104ja0201 
> bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing RemoteCache"  
> logger=server
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing 
> RenderingService" logger=server Jun 13 03:37:59 fl31ca104ja0201 
> bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing AlertEngine"  
> logger=server
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing QuotaService"  
> logger=server
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing 
> ServerLockService" logger=server Jun 13 03:37:59 fl31ca104ja0201 
> bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing 
> UserAuthTokenService" logger=server Jun 13 03:37:59 fl31ca104ja0201 
> bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing 
> DatasourceCacheService" logger=server Jun 13 03:37:59 fl31ca104ja0201 
> bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing LoginService"  
> logger=server
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing SearchService"  
> logger=server
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing TracingService" 
> logger=server Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing 
> UsageStatsService" logger=server Jun 13 03:37:59 fl31ca104ja0201 
> bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing CleanUpService" 
> logger=server Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing 
> NotificationService" logger=server Jun 13 03:37:59 fl31ca104ja0201 
> bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing 
> provisioningServiceImpl" logger=server Jun 13 03:37:59 fl31ca104ja0201 
> bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=warn msg="[Deprecated] the datasource 
> provisioning config is outdated. please upgrade"
> logger=provisioning.datasources
> filename=/etc/grafana/provisioning/datasources/ceph-dashboard.yml
>
> This warning  comes due to the missing “ apiVersion: 1”  first line  
> entry in  /etc/grafana/provisioning/datasources/ceph-dashboard.yml   
> created by cephadm.
> If the file is modified to include the apiversion line  and  restart 
> Grafana service,
>
> Is this a known ISSUE ?
>
> Here is the content of the ceph-dashboard.yml  produced by cephadm
> deleteDatasources:
>   - name: 'Dashboard1'
>     orgId: 1
>
>   - name: 'Loki'
>     orgId: 2
>
> datasources:
>   - name: 'Dashboard1'
>     type: 'prometheus'
>     access: 'proxy'
>     orgId: 1
>     url: 'http://fl31ca104ja0201.xxx.xxx.com:9095'
>     basicAuth: false
>     isDefault: true
>     editable: false
>
>   - name: 'Loki'
>     type: 'loki'
>     access: 'proxy'
>     orgId: 2
>     url: ''
>     basicAuth: false
>     isDefault: true
>     editable: false
> --------------------------------------------------------------
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="inserting datasource from 
> configuration " logger=provisioning.datasources name=Dashboard1 Jun 13 
> 03:37:59 fl31ca104ja0201 bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="inserting datasource from 
> configuration " logger=provisioning.datasources name=Loki Jun 13 
> 03:37:59 fl31ca104ja0201 bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=info msg="Backend rendering via 
> phantomJS" logger=rendering renderer=phantomJS Jun 13 03:37:59 
> fl31ca104ja0201 bash[391969]:
> t=2023-06-13T03:37:59+0000 lvl=warn msg="phantomJS is deprecated and 
> will be removed in a future release. You should consider migrating 
> from phantomJS to grafana-image-renderer plugin. Read more at 
> https://grafana.com/docs/grafana/latest/administration/image_rendering/";
> logger=rendering renderer=phantomJS
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="Initializing Stream Manager"
> Jun 13 03:37:59 fl31ca104ja0201 bash[391969]:  
> t=2023-06-13T03:37:59+0000 lvl=info msg="HTTP Server Listen"  
> logger=http.server address=[::]:3000 protocol=https subUrl= socket=
>
>
> I also had to change a few other things to keep all the services 
> running. The last issue that I have not been able to resolve yet is 
> the Cephbash board gives this error even though grafana is running on 
> the same server. However, the grafana dashboard cannot be accessed 
> without  tunnelling.
>
> [cid:image002.png@01D9A10B.F8B9D220]

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx