Hi Bailey. ceph-14 (rank=0): num_stray=205532 ceph-13 (rank=1): num_stray=4446 ceph-21-mds (rank=2): num_stray=99446249 ceph-23 (rank=3): num_stray=3412 ceph-08 (rank=4): num_stray=1238 ceph-15 (rank=5): num_stray=1486 ceph-16 (rank=6): num_stray=5545 ceph-11 (rank=7): num_stray=2995 The stats for rank 2 are almost certainly out of date though. The config dump is large, but since you asked. Its only 3 settings that are present for maintenance and workaround reasons: mds_beacon_grace, auth_service_ticket_ttl and mon_osd_report_timeout. The last is for a different issue though. WHO MASK LEVEL OPTION VALUE RO global advanced auth_service_ticket_ttl 129600.000000 global advanced mds_beacon_grace 1209600.000000 global advanced mon_pool_quota_crit_threshold 90 global advanced mon_pool_quota_warn_threshold 70 global dev mon_warn_on_pool_pg_num_not_power_of_two false global advanced osd_map_message_max_bytes 16384 global advanced osd_op_queue wpq * global advanced osd_op_queue_cut_off high * global advanced osd_pool_default_pg_autoscale_mode off mon advanced mon_allow_pool_delete false mon advanced mon_osd_down_out_subtree_limit host mon advanced mon_osd_min_down_reporters 3 mon advanced mon_osd_report_timeout 86400 mon advanced mon_osd_reporter_subtree_level host mon advanced mon_pool_quota_warn_threshold 70 mon advanced mon_sync_max_payload_size 4096 mon advanced mon_warn_on_insecure_global_id_reclaim false mon advanced mon_warn_on_insecure_global_id_reclaim_allowed false mgr advanced mgr/balancer/active false mgr advanced mgr/dashboard/ceph-01/server_addr 10.40.88.65 * mgr advanced mgr/dashboard/ceph-02/server_addr 10.40.88.66 * mgr advanced mgr/dashboard/ceph-03/server_addr 10.40.88.67 * mgr advanced mgr/dashboard/server_port 8443 * mgr advanced mon_pg_warn_max_object_skew 10.000000 mgr basic target_max_misplaced_ratio 1.000000 osd advanced bluefs_buffered_io true osd advanced bluestore_compression_min_blob_size_hdd 262144 osd advanced bluestore_compression_min_blob_size_ssd 65536 osd advanced bluestore_compression_mode aggressive osd class:rbd_perf advanced bluestore_compression_mode none osd dev bluestore_fsck_quick_fix_on_mount false osd advanced osd_deep_scrub_randomize_ratio 0.000000 osd class:hdd advanced osd_delete_sleep 300.000000 osd advanced osd_fast_shutdown false osd class:fs_meta advanced osd_max_backfills 12 osd class:hdd advanced osd_max_backfills 3 osd class:rbd_data advanced osd_max_backfills 6 osd class:rbd_meta advanced osd_max_backfills 12 osd class:rbd_perf advanced osd_max_backfills 12 osd class:ssd advanced osd_max_backfills 12 osd advanced osd_max_backfills 3 osd class:fs_meta dev osd_memory_cache_min 2147483648 osd class:hdd dev osd_memory_cache_min 1073741824 osd class:rbd_data dev osd_memory_cache_min 2147483648 osd class:rbd_meta dev osd_memory_cache_min 1073741824 osd class:rbd_perf dev osd_memory_cache_min 2147483648 osd class:ssd dev osd_memory_cache_min 2147483648 osd dev osd_memory_cache_min 805306368 osd class:fs_meta basic osd_memory_target 6442450944 osd class:hdd basic osd_memory_target 3221225472 osd class:rbd_data basic osd_memory_target 4294967296 osd class:rbd_meta basic osd_memory_target 2147483648 osd class:rbd_perf basic osd_memory_target 6442450944 osd class:ssd basic osd_memory_target 4294967296 osd basic osd_memory_target 2147483648 osd class:rbd_perf advanced osd_op_num_threads_per_shard 4 * osd class:hdd advanced osd_recovery_delay_start 600.000000 osd class:rbd_data advanced osd_recovery_delay_start 300.000000 osd class:rbd_perf advanced osd_recovery_delay_start 300.000000 osd class:fs_meta advanced osd_recovery_max_active 32 osd class:hdd advanced osd_recovery_max_active 8 osd class:rbd_data advanced osd_recovery_max_active 16 osd class:rbd_meta advanced osd_recovery_max_active 32 osd class:rbd_perf advanced osd_recovery_max_active 16 osd class:ssd advanced osd_recovery_max_active 32 osd advanced osd_recovery_max_active 8 osd class:fs_meta advanced osd_recovery_sleep 0.002500 osd class:hdd advanced osd_recovery_sleep 0.050000 osd class:rbd_data advanced osd_recovery_sleep 0.025000 osd class:rbd_meta advanced osd_recovery_sleep 0.002500 osd class:rbd_perf advanced osd_recovery_sleep 0.010000 osd class:ssd advanced osd_recovery_sleep 0.002500 osd advanced osd_recovery_sleep 0.050000 osd class:hdd dev osd_scrub_backoff_ratio 0.330000 osd class:hdd advanced osd_scrub_during_recovery true osd advanced osd_scrub_load_threshold 0.750000 osd class:fs_meta advanced osd_snap_trim_sleep 0.050000 osd class:hdd advanced osd_snap_trim_sleep 2.000000 osd class:rbd_data advanced osd_snap_trim_sleep 0.100000 mds basic client_cache_size 8192 mds advanced defer_client_eviction_on_laggy_osds false mds advanced mds_bal_fragment_size_max 100000 mds basic mds_cache_memory_limit 25769803776 mds advanced mds_cache_reservation 0.500000 mds advanced mds_max_caps_per_client 65536 mds advanced mds_min_caps_per_client 4096 mds advanced mds_recall_max_caps 32768 mds advanced mds_session_blocklist_on_timeout false Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Bailey Allison <ballison@xxxxxxxxxxxx> Sent: Thursday, January 16, 2025 10:08 PM To: ceph-users@xxxxxxx Subject: Re: MDS hung in purge_stale_snap_data after populating cache Frank, Are you able to share an update to date ceph config dump and ceph daemon mds.X perf dump | grep strays from the cluster? We're just getting through our comically long ceph outage, so i'd like to be able to share the love here hahahaha Regards, Bailey Allison Service Team Lead 45Drives, Ltd. 866-594-7199 x868 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx