Re: How to reduce CephFS num_strays effectively?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

If the strays are increasing, it mostly means there are references
lingering around. You can try to evaluate strays in the ~mdsdir [0]. If
strays keep on increasing at a staggering rate then check if the files/dirs
deleted are referenced anywhere (like snapshots) and as Eugen mentioned
note the mentioned correlation between mds_bal_fragment_size_max and
mds_cache_memory_limit.

Also, since the client is doing a huge 10TiB file deletion, can you show me
what the purge_queue looks like?

[0]
https://docs.ceph.com/en/reef/cephfs/scrub/#evaluate-strays-using-recursive-scrub

*Dhairya Parmar*

Software Engineer, CephFS


On Tue, Feb 18, 2025 at 5:32 AM Jinfeng Biao <Jinfeng.Biao@xxxxxxxxxx>
wrote:

> Hello Eugen and all,
>
> Thanks for the reply. We’ve checked the SuSE doc before raising it twice.
> From 100k to 125k, then to 150k.
>
> We are a  bit worried about the continuous growth of strays at 50K a day
> and would like to find an effective to reduce the strays.
>
> Last night another 30K increase in the strays.
>
> Thanks
> Jinfeng
>
>
> From: Eugen Block <eblock@xxxxxx>
> Date: Sunday, 16 February 2025 at 7:32 PM
> To: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
> Subject:  Re: How to reduce CephFS num_strays effectively?
> ⚠ EXTERNAL EMAIL: Do not click links or open any attachments unless you
> trust the sender and know the content is safe. ⚠
>
>
> Hi,
>
> this SUSE article [0] covers that, it helped us with a customer a few
> years ago. The recommendation was to double the
> mds_bal_fragment_size_max (default 100k) to 200k, which worked nicely
> for them. Also note the mentioned correlation between
> mds_bal_fragment_size_max and mds_cache_memory_limit.
>
> Regards,
> Eugen
>
> [0]
> https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.suse.com%2Fde-de%2Fsupport%2Fkb%2Fdoc%2F%3Fid%3D000020569&data=05%7C02%7Cjinfeng.biao%40cba.com.au%7Cd2558d1e23384b02c9f308dd4e6cc82a%7Cdddffba06c174f3497483fa5e08cc366%7C0%7C0%7C638752951318780733%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=GXn4vSVpmchKLA3IWFZx0HTTRlo%2FHHimg82inRUnHy4%3D&reserved=0
> <https://www.suse.com/de-de/support/kb/doc/?id=000020569>
>
> Zitat von jinfeng.biao@xxxxxxxxxx:
>
> > Hello folks,
> >
> > We had an issue with the num_strays hit 1 million recently. As a
> > workaround, max bal was increased to 125,000.
> >
> > The stray_num keeps growing at 25k per day.  After a recent
> > observation of 10TiB file deletion,  the relevant application was
> > stopped.
> >
> > Then we increased purging options to below values
> >
> >   mds            advanced filer_max_purge_ops
> 40
> >   mds            advanced mds_max_purge_files
>  1024
> >   mds            advanced mds_max_purge_ops
> 32768
> >   mds            advanced mds_max_purge_ops_per_pg              3
> >
> > And run "du -hsx" to the top level directory mounted to the app that
> > does massive deletion.
> >
> > Despite all above, strays still growing at 60K per day.
> >
> > There are a lot more applications using this CephFS filesystem  and
> > only this app is observed perform deletion at this scale.
> >
> > I'm wondering what would be the effective way to cleanup the strays
> > in this situation  while making the least impact to production.
> >
> > Note: We are on 14.2.6
> >
> > thanks
> > James Biao
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> ************** IMPORTANT MESSAGE **************
> This e-mail message is intended only for the addressee(s) and contains
> information which may be confidential.
> If you are not the intended recipient please advise the sender by return
> email, do not use or disclose the contents, and delete the message and any
> attachments from your system. Unless specifically indicated, this email
> does not constitute formal advice or commitment by the sender or the
> Commonwealth Bank of Australia (ABN 48 123 123 124 AFSL and Australian
> credit licence 234945) or its subsidiaries.
> We can be contacted through our web site: commbank.com.au. If you no
> longer wish to receive commercial electronic messages from us, please reply
> to this e-mail by typing Opt Out in the subject line.
> **************************************************
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux