Hey Frank,
I hate to sound like a broken record here but if you can access any of
the stuff that's in rank 2 try running a 'find /path/to/dir/ -ls' on
some of the stuff and see if the num_strays decrease. I've had that help
last time we've had an MDS like that.
Regards,
Bailey Allison
Service Team Lead
45Drives, Ltd.
866-594-7199 x868
On 1/17/25 10:02, Frank Schilder wrote:
Hi Bailey.
ceph-14 (rank=0): num_stray=205532
ceph-13 (rank=1): num_stray=4446
ceph-21-mds (rank=2): num_stray=99446249
ceph-23 (rank=3): num_stray=3412
ceph-08 (rank=4): num_stray=1238
ceph-15 (rank=5): num_stray=1486
ceph-16 (rank=6): num_stray=5545
ceph-11 (rank=7): num_stray=2995
The stats for rank 2 are almost certainly out of date though. The config dump is large, but since you asked. Its only 3 settings that are present for maintenance and workaround reasons: mds_beacon_grace, auth_service_ticket_ttl and mon_osd_report_timeout. The last is for a different issue though.
WHO MASK LEVEL OPTION VALUE RO
global advanced auth_service_ticket_ttl 129600.000000
global advanced mds_beacon_grace 1209600.000000
global advanced mon_pool_quota_crit_threshold 90
global advanced mon_pool_quota_warn_threshold 70
global dev mon_warn_on_pool_pg_num_not_power_of_two false
global advanced osd_map_message_max_bytes 16384
global advanced osd_op_queue wpq *
global advanced osd_op_queue_cut_off high *
global advanced osd_pool_default_pg_autoscale_mode off
mon advanced mon_allow_pool_delete false
mon advanced mon_osd_down_out_subtree_limit host
mon advanced mon_osd_min_down_reporters 3
mon advanced mon_osd_report_timeout 86400
mon advanced mon_osd_reporter_subtree_level host
mon advanced mon_pool_quota_warn_threshold 70
mon advanced mon_sync_max_payload_size 4096
mon advanced mon_warn_on_insecure_global_id_reclaim false
mon advanced mon_warn_on_insecure_global_id_reclaim_allowed false
mgr advanced mgr/balancer/active false
mgr advanced mgr/dashboard/ceph-01/server_addr 10.40.88.65 *
mgr advanced mgr/dashboard/ceph-02/server_addr 10.40.88.66 *
mgr advanced mgr/dashboard/ceph-03/server_addr 10.40.88.67 *
mgr advanced mgr/dashboard/server_port 8443 *
mgr advanced mon_pg_warn_max_object_skew 10.000000
mgr basic target_max_misplaced_ratio 1.000000
osd advanced bluefs_buffered_io true
osd advanced bluestore_compression_min_blob_size_hdd 262144
osd advanced bluestore_compression_min_blob_size_ssd 65536
osd advanced bluestore_compression_mode aggressive
osd class:rbd_perf advanced bluestore_compression_mode none
osd dev bluestore_fsck_quick_fix_on_mount false
osd advanced osd_deep_scrub_randomize_ratio 0.000000
osd class:hdd advanced osd_delete_sleep 300.000000
osd advanced osd_fast_shutdown false
osd class:fs_meta advanced osd_max_backfills 12
osd class:hdd advanced osd_max_backfills 3
osd class:rbd_data advanced osd_max_backfills 6
osd class:rbd_meta advanced osd_max_backfills 12
osd class:rbd_perf advanced osd_max_backfills 12
osd class:ssd advanced osd_max_backfills 12
osd advanced osd_max_backfills 3
osd class:fs_meta dev osd_memory_cache_min 2147483648
osd class:hdd dev osd_memory_cache_min 1073741824
osd class:rbd_data dev osd_memory_cache_min 2147483648
osd class:rbd_meta dev osd_memory_cache_min 1073741824
osd class:rbd_perf dev osd_memory_cache_min 2147483648
osd class:ssd dev osd_memory_cache_min 2147483648
osd dev osd_memory_cache_min 805306368
osd class:fs_meta basic osd_memory_target 6442450944
osd class:hdd basic osd_memory_target 3221225472
osd class:rbd_data basic osd_memory_target 4294967296
osd class:rbd_meta basic osd_memory_target 2147483648
osd class:rbd_perf basic osd_memory_target 6442450944
osd class:ssd basic osd_memory_target 4294967296
osd basic osd_memory_target 2147483648
osd class:rbd_perf advanced osd_op_num_threads_per_shard 4 *
osd class:hdd advanced osd_recovery_delay_start 600.000000
osd class:rbd_data advanced osd_recovery_delay_start 300.000000
osd class:rbd_perf advanced osd_recovery_delay_start 300.000000
osd class:fs_meta advanced osd_recovery_max_active 32
osd class:hdd advanced osd_recovery_max_active 8
osd class:rbd_data advanced osd_recovery_max_active 16
osd class:rbd_meta advanced osd_recovery_max_active 32
osd class:rbd_perf advanced osd_recovery_max_active 16
osd class:ssd advanced osd_recovery_max_active 32
osd advanced osd_recovery_max_active 8
osd class:fs_meta advanced osd_recovery_sleep 0.002500
osd class:hdd advanced osd_recovery_sleep 0.050000
osd class:rbd_data advanced osd_recovery_sleep 0.025000
osd class:rbd_meta advanced osd_recovery_sleep 0.002500
osd class:rbd_perf advanced osd_recovery_sleep 0.010000
osd class:ssd advanced osd_recovery_sleep 0.002500
osd advanced osd_recovery_sleep 0.050000
osd class:hdd dev osd_scrub_backoff_ratio 0.330000
osd class:hdd advanced osd_scrub_during_recovery true
osd advanced osd_scrub_load_threshold 0.750000
osd class:fs_meta advanced osd_snap_trim_sleep 0.050000
osd class:hdd advanced osd_snap_trim_sleep 2.000000
osd class:rbd_data advanced osd_snap_trim_sleep 0.100000
mds basic client_cache_size 8192
mds advanced defer_client_eviction_on_laggy_osds false
mds advanced mds_bal_fragment_size_max 100000
mds basic mds_cache_memory_limit 25769803776
mds advanced mds_cache_reservation 0.500000
mds advanced mds_max_caps_per_client 65536
mds advanced mds_min_caps_per_client 4096
mds advanced mds_recall_max_caps 32768
mds advanced mds_session_blocklist_on_timeout false
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Bailey Allison <ballison@xxxxxxxxxxxx>
Sent: Thursday, January 16, 2025 10:08 PM
To: ceph-users@xxxxxxx
Subject: Re: MDS hung in purge_stale_snap_data after populating cache
Frank,
Are you able to share an update to date ceph config dump and ceph daemon
mds.X perf dump | grep strays from the cluster?
We're just getting through our comically long ceph outage, so i'd like
to be able to share the love here hahahaha
Regards,
Bailey Allison
Service Team Lead
45Drives, Ltd.
866-594-7199 x868
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx