Re: What is client request_load_avg? Troubleshooting MDS issues on Luminous

Chris Smart <distroguy@xxxxxxxxx> · Wed, 17 Aug 2022 21:43:00 +1000

On Wed, 2022-08-17 at 17:10 +1000, Chris Smart wrote:
> Looking at the MDS ops in flight, the majority are journal_and_reply:
> 
> $ sudo ceph daemon mds.$(hostname) dump_ops_in_flight |grep
> 'flag_point' |sort |uniq -c
>      28                 "flag_point": "failed to rdlock, waiting",
>       2                 "flag_point": "failed to wrlock, waiting",
>      18                 "flag_point": "failed to xlock, waiting",
>     418                 "flag_point": "submit entry:
> journal_and_reply",
> 
> Does anyone know where I can find more info as to what
> journal_and_reply means? Is it solely about reading and writing to
> the
> metadata pool, or is it waiting for OSDs to perform some action (like
> ensure a file is gonem, so that it can then write to metadata pool,
> perhaps)?

Ohhhh, is "journal_and_reply" actually the very last event in a
successful operation?...[1] No wonder so many are the last event...
:facepalm:

OK, well assuming that, then I can probably look out for ops which have
a both a journal_and_reply event and took a large duration and see what
they got stuck on... then maybe work out whatever that stuck event
means.

[1]
https://github.com/ceph/ceph/blob/d54685879d59f2780035623e40e31115d80dabb1/src/mds/Server.cc#L1925

-c
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx