Re: How to reduce CephFS num_strays effectively?

Eugen Block <eblock@xxxxxx> · Sun, 16 Feb 2025 09:30:44 +0000

Hi,

this SUSE article [0] covers that, it helped us with a customer a few  
years ago. The recommendation was to double the  
mds_bal_fragment_size_max (default 100k) to 200k, which worked nicely  
for them. Also note the mentioned correlation between  
mds_bal_fragment_size_max and mds_cache_memory_limit.

Regards,
Eugen

[0] https://www.suse.com/de-de/support/kb/doc/?id=000020569

Zitat von jinfeng.biao@xxxxxxxxxx:

Hello folks,

We had an issue with the num_strays hit 1 million recently. As a  
workaround, max bal was increased to 125,000.

The stray_num keeps growing at 25k per day.  After a recent  
observation of 10TiB file deletion,  the relevant application was  
stopped.

Then we increased purging options to below values

  mds            advanced filer_max_purge_ops                            40
  mds            advanced mds_max_purge_files                           1024
  mds            advanced mds_max_purge_ops                            32768
  mds            advanced mds_max_purge_ops_per_pg              3

And run "du -hsx" to the top level directory mounted to the app that  
does massive deletion.

Despite all above, strays still growing at 60K per day.

There are a lot more applications using this CephFS filesystem  and  
only this app is observed perform deletion at this scale.

I'm wondering what would be the effective way to cleanup the strays  
in this situation  while making the least impact to production.

Note: We are on 14.2.6

thanks
James Biao
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx