Re: Ceph is constantly scrubbing 1/4 of all PGs and still have pigs not scrubbed in time

Eugen Block <eblock@xxxxxx> · Thu, 07 Mar 2024 08:50:10 +0000

Are the scrubs eventually reported as "scrub ok" in the OSD logs? How  
long do the scrubs take? Do you see updated timestamps in the 'ceph pg  
dump' output (column DEEP_SCRUB_STAMP)?

Zitat von thymus_03fumbler@xxxxxxxxxx:

I recently switched from 16.2.x to 18.2.x and migrated to cephadm,  
since the switch the cluster is constantly scrubbing, 24/7 up to 50  
PGs simultaneously and up to 20 deep scrubs simultaneously in a  
cluster that has only 12 (in use) OSDs.
Furthermore it still manages to regularly have a warning with ‘pgs  
not scrubbed in time’

I have tried various settings, like osd_deep_scrub_interval,  
osd_max_scrubs, mds_max_scrub_ops_in_progress etc.
All those get ignored.

Please advice.

Here is an output of ceos config dump:

WHO         MASK  LEVEL     OPTION                                    
    VALUE                                                             
                          RO
global            advanced  auth_client_required                      
    cephx                                                             
                          *
global            advanced  auth_cluster_required                     
    cephx                                                             
                          *
global            advanced  auth_service_required                     
    cephx                                                             
                          *
global            advanced  auth_supported                            
    cephx                                                             
                          *
global            basic     container_image                           

quay.io/ceph/ceph@sha256:aca35483144ab3548a7f670db9b79772e6fc51167246421c66c0bd56a6585468   
*
global            basic     device_failure_prediction_mode            
    local
global            advanced  mon_allow_pool_delete                        true
global            advanced  mon_data_avail_warn                          20
global            advanced  mon_max_pg_per_osd                           400
global            advanced  osd_max_pg_per_osd_hard_ratio             
    10.000000
global            advanced  osd_pool_default_pg_autoscale_mode           on
mon               advanced  auth_allow_insecure_global_id_reclaim     
    false
mon               advanced  mon_crush_min_required_version            
    firefly                                                           
                          *
mon               advanced  mon_warn_on_pool_no_redundancy            
    false
mon               advanced  public_network                            
    10.79.0.0/16                                                      
                          *
mgr               advanced  mgr/balancer/active                          true
mgr               advanced  mgr/balancer/mode                         
    upmap
mgr               advanced   
mgr/cephadm/manage_etc_ceph_ceph_conf_hosts  label:admin              
                                                                   *
mgr               advanced  mgr/cephadm/migration_current             
    6                                                                 
                          *
mgr               advanced  mgr/dashboard/GRAFANA_API_PASSWORD        
    admin                                                             
                          *
mgr               advanced  mgr/dashboard/GRAFANA_API_SSL_VERIFY      
    false                                                             
                          *
mgr               advanced  mgr/dashboard/GRAFANA_API_URL             
    https://10.79.79.12:3000                                          
                          *
mgr               advanced  mgr/dashboard/PROMETHEUS_API_HOST         
    http://10.79.79.12:9095                                           
                          *
mgr               advanced  mgr/devicehealth/enable_monitoring           true
mgr               advanced  mgr/orchestrator/orchestrator             
    cephadm
osd               advanced  osd_map_cache_size                           250
osd               advanced  osd_map_share_max_epochs                     50
osd               advanced  osd_mclock_profile                        
    high_client_ops
osd               advanced  osd_pg_epoch_persisted_max_stale             50
osd.0             basic     osd_mclock_max_capacity_iops_hdd          
    380.869888
osd.1             basic     osd_mclock_max_capacity_iops_hdd          
    441.000000
osd.10            basic     osd_mclock_max_capacity_iops_ssd          
    13677.906485
osd.11            basic     osd_mclock_max_capacity_iops_hdd          
    274.411212
osd.13            basic     osd_mclock_max_capacity_iops_hdd          
    198.492501
osd.2             basic     osd_mclock_max_capacity_iops_hdd          
    251.592009
osd.3             basic     osd_mclock_max_capacity_iops_hdd          
    208.197434
osd.4             basic     osd_mclock_max_capacity_iops_hdd          
    196.544082
osd.5             basic     osd_mclock_max_capacity_iops_ssd          
    12739.225456
osd.6             basic     osd_mclock_max_capacity_iops_hdd          
    211.288660
osd.7             basic     osd_mclock_max_capacity_iops_hdd          
    210.543236
osd.8             basic     osd_mclock_max_capacity_iops_hdd          
    242.241594
osd.9             basic     osd_mclock_max_capacity_iops_hdd          
    559.933780
mds.plexfs        basic     mds_join_fs                               
    plexfs

Here is a ceph -s output
services:
    mon: 3 daemons, quorum  
lxt-prod-ceph-util02,lxt-prod-ceph-util01,lxt-prod-ceph-util03 (age  
3w)
    mgr: lxt-prod-ceph-util02.iyrhxj(active, since 3w), standbys:  
lxt-prod-ceph-util03.wvstpe
    mds: 1/1 daemons up
    osd: 14 osds: 14 up (since 4w), 14 in (since 4w)
data:
    volumes: 1/1 healthy
    pools:   4 pools, 193 pgs
    objects: 14.48M objects, 52 TiB
    usage:   71 TiB used, 39 TiB / 110 TiB avail
    pgs:     131 active+clean
             47  active+clean+scrubbing
             15  active+clean+scrubbing+deep
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx