Re: What is client request_load_avg? Troubleshooting MDS issues on Luminous

Chris Smart <distroguy@xxxxxxxxx> · Tue, 23 Aug 2022 12:01:50 +1000

On Mon, 2022-08-22 at 11:13 +0000, Frank Schilder wrote:
> Hi Chris.
> 
> > Interestingly, when duration gets long and performance gets bad ...
> 
> This observation is likely due to MDS and client cache. My experience
> with ceph's cache implementations is that, well, they seem not that
> great. I think what we both observe is, that with an empty cache
> everything works fine. As soon as the cache starts saturating, memory
> needs to be freed. This seems to result in heavy fragmentation,
> making certain operations slower and slower (for example, alloc and
> free).

OK thanks, makes sense. The interesting thing to me was how long it
takes to get locks specifically - which indicates more a problem with
client end, rather than MDS end (but I'm not certain).

>  There was a ceph-user thread discussing performance as a function of
> cache size and the finding was that performance increases with
> reducing cache size.
> 

Right, this was the kind of thing I was wondering too - that reducing
the cache might actually help keep things under control earlier.
Thanks!

> Since then I use the following mds settings:
> 
> client_cache_size =
> 8192                                                            
> mds_cache_memory_limit =
> 17179869184                                                     
> mds_cache_reservation =
> 0.500000                                                        
> mds_max_caps_per_client =
> 65536                                                           
> mds_min_caps_per_client =
> 4096                                                            
> mds_recall_max_caps =
> 32768                                                           
> 

Thanks a lot. I plan to start slowly testing some changes with these
settings and monitor the result. Hopefully I can come to some values
which work best for this cluster.

> I'm thinking about increasing the mid-point (mds_cache_reservation)
> even further to have a large amount of free cache mem to absorb load
> bursts and release when the burst is over.
> 

Right, that makes sense. Your setting is currently set to 50%, right?
Mine is set to the default 5% so definitely seems like a good thing to
increase. If I understand correctly, this will reserve more memory for
new ops, which should mean those ops don't come under as much pressure
if the cache is already full (and assuming the MDS can clear out the
cache fast enough).

> About your graphs: to have IO-peaks after MDS failover is expected.
> Clients actually continue to make IO requests that go to system's
> buffer. Once the MDS is back up, these buffered OPS get applied,
> leading to a temporary increase of load. The only exceptional raise
> is the last one, which might have coincided with someone starting to
> do a lot of IO.
> 

OK, so my analysis is probably wrong then, in that it's not a case that
the clients are writing their caches out becasue MDS is going away,
it's just that while the MDS is going away, they keep caching, then we
see more writes when it comes back.

The last one is interesting in that the MDS was up for a much longer
time, which I thought explains why the peak is higher - i.e. there was
more client data cached that got flushed...

Finally, do you know of anything I can do on the client side to help
keep its cache down?

I'm also thinking of testing mounts with noatime and nodiratime options
which might help to reduce the amount of MDS ops at least, but not sure
of any other client side settings or tweaks that might help.

Thanks again for sticking this thread out and helping with all your
answers. I have learned a lot (still lots to go!).

I guess next steps for me are to try and see if manipulating mds cache
and caps settings will help at all. But ultimately, I need to upgrade,
expand and tune the cluster...

Cheers,
-c

> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx