What is client request_load_avg? Troubleshooting MDS issues on Luminous

Chris Smart <distroguy@xxxxxxxxx> · Sun, 14 Aug 2022 13:47:12 +1000

Hi all,

I have recently inherited a 10 node Ceph cluster running Luminous (12.2.12)
which is running specifically for CephFS (and I don't know much about MDS)
with only one active MDS server (two standby).
It's not a great cluster IMO, the cephfs_data pool is on high density nodes
with high capacity SATA drives but at least the cephfs_metadata pool is on
nvme drives.

Access to the cluster regularly goes slow for clients and I'm seeing lots
of warnings like this:

MDSs behind on trimming (MDS_TRIM)
MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
MDSs report slow requests (MDS_SLOW_REQUEST)
MDSs have many clients failing to respond to capability release
(MDS_CLIENT_LATE_RELEASE_MANY)

If there is only one client that's failing to respond to capability release
I can see the client id in the output and work out what user that is and
get their job stopped. Performance then usually improves a bit.

However, if there is more than one, the output only shows a summary of the
number of clients and I don't know who the clients are to get their jobs
cancelled.
Is there a way I can work out what clients these are? I'm guessing some
kind of combination of in_flight_ops, blocked_ops and total num_caps?

However, I also feel like just having a large number of caps isn't
_necessarily_ an indicator of a problem, sometimes restarting MDS and
forcing clients to drop unused caps helps, sometimes it doesn't.

I'm curious if there's a better way to determine any clients that might be
causing issues in the cluster?
To that end, I've noticed there is a metric called "request_load_avg" in
the output of ceph mds client ls but I can't quite find any information
about it. It _seems_ like it could indicate a client that's doing lots and
lots of requests and therefore a useful metric to see what client might be
smashing the cluster, but does anyone know for sure?

Many thanks,
Chris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx