Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

Manuel Lausch <manuel.lausch@xxxxxxxx> · Fri, 12 Nov 2021 14:33:26 +0100

Hi Sage,

I uploaded a lot of debug logs from the OSDs and Mons:
ceph-post-file: 4ebc2eeb-7bb1-48c4-bbfa-ed581faca74f

At 13:24:25 I stopped OSD 122 and one Minute later I started it again. 
In both cases I got slow ops.

Currently I running the upstream Version (without crude patches) 
ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific
(stable)

I hope you can work with it.

here the current config

# ceph config dump
WHO     MASK  LEVEL     OPTION                                          VALUE     RO
global        advanced  osd_fast_shutdown                               false       
global        advanced  osd_fast_shutdown_notify_mon                    false       
global        dev       osd_pool_default_read_lease_ratio               0.800000    
global        advanced  paxos_propose_interval                          1.000000    
  mon         advanced  auth_allow_insecure_global_id_reclaim           true        
  mon         advanced  mon_warn_on_insecure_global_id_reclaim          false       
  mon         advanced  mon_warn_on_insecure_global_id_reclaim_allowed  false       
  mgr         advanced  mgr/balancer/active                             true        
  mgr         advanced  mgr/balancer/mode                               upmap       
  mgr         advanced  mgr/balancer/upmap_max_deviation                1           
  mgr         advanced  mgr/progress/enabled                            false     * 
  osd         dev       bluestore_fsck_quick_fix_on_mount
  true        

# cat /etc/ceph/ceph.conf 
[global]
    # The following parameters are defined in the service.properties like below
    # ceph.conf.globa.osd_max_backfills: 1

  bluefs bufferd io = true
  bluestore fsck quick fix on mount = false
  cluster network = 10.88.26.0/24
  fsid = 72ccd9c4-5697-478c-99f6-b5966af278c6
  max open files = 131072
  mon host = 10.88.7.41 10.88.7.42 10.88.7.43
  mon max pg per osd = 600
  mon osd down out interval = 1800
  mon osd down out subtree limit = host
  mon osd initial require min compat client = luminous
  mon osd min down reporters = 2
  mon osd reporter subtree level = host
  mon pg warn max object skew = 100
  osd backfill scan max = 16
  osd backfill scan min = 8
  osd deep scrub stride = 1048576
  osd disk threads = 1
  osd heartbeat min size = 0
  osd max backfills = 1
  osd max scrubs = 1
  osd op complaint time = 5
  osd pool default flag hashpspool = true
  osd pool default min size = 1
  osd pool default size = 3
  osd recovery max active = 1
  osd recovery max single start = 1
  osd recovery op priority = 3
  osd recovery sleep hdd = 0.0
  osd scrub auto repair = true
  osd scrub begin hour = 5
  osd scrub chunk max = 1
  osd scrub chunk min = 1
  osd scrub during recovery = true
  osd scrub end hour = 23
  osd scrub load threshold = 1
  osd scrub priority = 1
  osd scrub thread suicide timeout = 0
  osd snap trim priority = 1
  osd snap trim sleep = 1.0
  public network = 10.88.7.0/24

[mon]
  mon allow pool delete = false
  mon health preluminous compat warning = false
  osd pool default flag hashpspool = true

On Thu, 11 Nov 2021 09:16:20 -0600
Sage Weil <sage@xxxxxxxxxxxx> wrote:

> Hi Manuel,
> 
> Before giving up and putting in an off switch, I'd like to understand
> why it is taking as long as it is for the PGs to go active.
> 
> Would you consider enabling debug_osd=10 and debug_ms=1 on your OSDs,
> and debug_mon=10 + debug_ms=1 on the mons, and reproducing this
> (without the patch applied this time of course!)?  The logging will
> slow things down a bit but hopefully the behavior will be close
> enough to what you see normally that we can tell what is going on
> (and presumably picking out the pg that was most laggy will highlight
> the source(s) of the delay).
> 
> sage
> 
> On Wed, Nov 10, 2021 at 4:41 AM Manuel Lausch <manuel.lausch@xxxxxxxx>
> wrote:
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx