Ceph is constantly scrubbing 1/4 of all PGs and still have pigs not scrubbed in time

thymus_03fumbler@xxxxxxxxxx · Sat, 24 Feb 2024 12:49:06 +0100

I recently switched from 16.2.x to 18.2.x and migrated to cephadm, since the switch the cluster is constantly scrubbing, 24/7 up to 50 PGs simultaneously and up to 20 deep scrubs simultaneously in a cluster that has only 12 (in use) OSDs.
Furthermore it still manages to regularly have a warning with ‘pgs not scrubbed in time’

I have tried various settings, like osd_deep_scrub_interval, osd_max_scrubs, mds_max_scrub_ops_in_progress etc.
All those get ignored.

Please advice.

Here is an output of ceos config dump:

WHO         MASK  LEVEL     OPTION                                       VALUE                                                                                      RO
global            advanced  auth_client_required                         cephx                                                                                      *
global            advanced  auth_cluster_required                        cephx                                                                                      *
global            advanced  auth_service_required                        cephx                                                                                      *
global            advanced  auth_supported                               cephx                                                                                      *
global            basic     container_image                              quay.io/ceph/ceph@sha256:aca35483144ab3548a7f670db9b79772e6fc51167246421c66c0bd56a6585468  *
global            basic     device_failure_prediction_mode               local
global            advanced  mon_allow_pool_delete                        true
global            advanced  mon_data_avail_warn                          20
global            advanced  mon_max_pg_per_osd                           400
global            advanced  osd_max_pg_per_osd_hard_ratio                10.000000
global            advanced  osd_pool_default_pg_autoscale_mode           on
mon               advanced  auth_allow_insecure_global_id_reclaim        false
mon               advanced  mon_crush_min_required_version               firefly                                                                                    *
mon               advanced  mon_warn_on_pool_no_redundancy               false
mon               advanced  public_network                               10.79.0.0/16                                                                               *
mgr               advanced  mgr/balancer/active                          true
mgr               advanced  mgr/balancer/mode                            upmap
mgr               advanced  mgr/cephadm/manage_etc_ceph_ceph_conf_hosts  label:admin                                                                                *
mgr               advanced  mgr/cephadm/migration_current                6                                                                                          *
mgr               advanced  mgr/dashboard/GRAFANA_API_PASSWORD           admin                                                                                      *
mgr               advanced  mgr/dashboard/GRAFANA_API_SSL_VERIFY         false                                                                                      *
mgr               advanced  mgr/dashboard/GRAFANA_API_URL                https://10.79.79.12:3000                                                                   *
mgr               advanced  mgr/dashboard/PROMETHEUS_API_HOST            http://10.79.79.12:9095                                                                    *
mgr               advanced  mgr/devicehealth/enable_monitoring           true
mgr               advanced  mgr/orchestrator/orchestrator                cephadm
osd               advanced  osd_map_cache_size                           250
osd               advanced  osd_map_share_max_epochs                     50
osd               advanced  osd_mclock_profile                           high_client_ops
osd               advanced  osd_pg_epoch_persisted_max_stale             50
osd.0             basic     osd_mclock_max_capacity_iops_hdd             380.869888
osd.1             basic     osd_mclock_max_capacity_iops_hdd             441.000000
osd.10            basic     osd_mclock_max_capacity_iops_ssd             13677.906485
osd.11            basic     osd_mclock_max_capacity_iops_hdd             274.411212
osd.13            basic     osd_mclock_max_capacity_iops_hdd             198.492501
osd.2             basic     osd_mclock_max_capacity_iops_hdd             251.592009
osd.3             basic     osd_mclock_max_capacity_iops_hdd             208.197434
osd.4             basic     osd_mclock_max_capacity_iops_hdd             196.544082
osd.5             basic     osd_mclock_max_capacity_iops_ssd             12739.225456
osd.6             basic     osd_mclock_max_capacity_iops_hdd             211.288660
osd.7             basic     osd_mclock_max_capacity_iops_hdd             210.543236
osd.8             basic     osd_mclock_max_capacity_iops_hdd             242.241594
osd.9             basic     osd_mclock_max_capacity_iops_hdd             559.933780
mds.plexfs        basic     mds_join_fs                                  plexfs

Here is a ceph -s output
services:
    mon: 3 daemons, quorum lxt-prod-ceph-util02,lxt-prod-ceph-util01,lxt-prod-ceph-util03 (age 3w)
    mgr: lxt-prod-ceph-util02.iyrhxj(active, since 3w), standbys: lxt-prod-ceph-util03.wvstpe
    mds: 1/1 daemons up
    osd: 14 osds: 14 up (since 4w), 14 in (since 4w)
data:
    volumes: 1/1 healthy
    pools:   4 pools, 193 pgs
    objects: 14.48M objects, 52 TiB
    usage:   71 TiB used, 39 TiB / 110 TiB avail
    pgs:     131 active+clean
             47  active+clean+scrubbing
             15  active+clean+scrubbing+deep
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx