Re: What is client request_load_avg? Troubleshooting MDS issues on Luminous

Frank Schilder <frans@xxxxxx> · Mon, 22 Aug 2022 11:13:26 +0000

Hi Chris.

> Interestingly, when duration gets long and performance gets bad ...

This observation is likely due to MDS and client cache. My experience with ceph's cache implementations is that, well, they seem not that great. I think what we both observe is, that with an empty cache everything works fine. As soon as the cache starts saturating, memory needs to be freed. This seems to result in heavy fragmentation, making certain operations slower and slower (for example, alloc and free). There was a ceph-user thread discussing performance as a function of cache size and the finding was that performance increases with reducing cache size.

Since then I use the following mds settings:

client_cache_size = 8192                                                            
mds_cache_memory_limit = 17179869184                                                     
mds_cache_reservation = 0.500000                                                        
mds_max_caps_per_client = 65536                                                           
mds_min_caps_per_client = 4096                                                            
mds_recall_max_caps = 32768                                                           

I'm thinking about increasing the mid-point (mds_cache_reservation) even further to have a large amount of free cache mem to absorb load bursts and release when the burst is over.

About your graphs: to have IO-peaks after MDS failover is expected. Clients actually continue to make IO requests that go to system's buffer. Once the MDS is back up, these buffered OPS get applied, leading to a temporary increase of load. The only exceptional raise is the last one, which might have coincided with someone starting to do a lot of IO.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx