There is no definitive answer wrt mds tuning. As it is everywhere
mentioned, it's about finding the right setup for your specific
workload. If you can synthesize your workload (maybe scale down a bit)
try optimizing it in a test cluster without interrupting your
developers too much.
But what you haven't explained yet is what are you experiencing as a
performance issue? Do you have numbers or a detailed description?
From the fs status output you didn't seem to have too much activity
going on (around 140 requests per second), but that's probably not the
usual traffic? What does ceph report in its client IO output?
Can you paste the 'ceph osd df' output as well?
Do you have dedicated MDS servers or are they colocated with other services?
Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>:
Hello Eugen.
I read all of your MDS related topics and thank you so much for your effort
on this.
There is not much information and I couldn't find a MDS tuning guide at
all. It seems that you are the correct person to discuss mds debugging and
tuning.
Do you have any documents or may I learn what is the proper way to debug
MDS and clients ?
Which debug logs will guide me to understand the limitations and will help
to tune according to the data flow?
While searching, I find this:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YO4SGL4DJQ6EKUBUIHKTFSW72ZJ3XLZS/
quote:"A user running VSCodium, keeping 15k caps open.. the opportunistic
caps recall eventually starts recalling those but the (el7 kernel) client
won't release them. Stopping Codium seems to be the only way to release."
Because of this I think I also need to play around with the client side too.
My main goal is increasing the speed and reducing the latency and I wonder
if these ideas are correct or not:
- Maybe I need to increase client side cache size because via each client,
multiple users request a lot of objects and clearly the
client_cache_size=16 default is not enough.
- Maybe I need to increase client side maximum cache limit for
object "client_oc_max_objects=1000 to 10000" and data "client_oc_size=200mi
to 400mi"
- The client cache cleaning threshold is not aggressive enough to keep the
free cache size in the desired range. I need to make it aggressive but this
should not reduce speed and increase latency.
mds_cache_memory_limit=4gi to 16gi
client_oc_max_objects=1000 to 10000
client_oc_size=200mi to 400mi
client_permissions=false #to reduce latency.
client_cache_size=16 to 128
What do you think?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx