Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

Özkan Göksu <ozkangksu@xxxxxxxxx> · Thu, 25 Jan 2024 15:15:46 +0300

Hello  Eugen.

I read all of your MDS related topics and thank you so much for your effort
on this.
There is not much information and I couldn't find a MDS tuning guide at
all. It  seems that you are the correct person to discuss mds debugging and
tuning.

Do you have any documents or may I learn what is the proper way to debug
MDS and clients ?
Which debug logs will guide me to understand the limitations and will help
to tune according to the data flow?

While searching, I find this:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YO4SGL4DJQ6EKUBUIHKTFSW72ZJ3XLZS/
quote:"A user running VSCodium, keeping 15k caps open.. the opportunistic
caps recall eventually starts recalling those but the (el7 kernel) client
won't release them. Stopping Codium seems to be the only way to release."

Because of this I think I also need to play around with the client side too.

My main goal is increasing the speed and reducing the latency and I wonder
if these ideas are correct or not:
- Maybe I need to increase client side cache size because via each client,
multiple users request a lot of objects and clearly the
client_cache_size=16 default is not enough.
-  Maybe I need to increase client side maximum cache limit for
object "client_oc_max_objects=1000 to 10000" and data "client_oc_size=200mi
to 400mi"
- The client cache cleaning threshold is not aggressive enough to keep the
free cache size in the desired range. I need to make it aggressive but this
should not reduce speed and increase latency.

mds_cache_memory_limit=4gi to 16gi
client_oc_max_objects=1000 to 10000
client_oc_size=200mi to 400mi
client_permissions=false #to reduce latency.
client_cache_size=16 to 128

What do you think?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx