On Wed, 2022-08-17 at 17:10 +1000, Chris Smart wrote: > Looking at the MDS ops in flight, the majority are journal_and_reply: > > $ sudo ceph daemon mds.$(hostname) dump_ops_in_flight |grep > 'flag_point' |sort |uniq -c > 28 "flag_point": "failed to rdlock, waiting", > 2 "flag_point": "failed to wrlock, waiting", > 18 "flag_point": "failed to xlock, waiting", > 418 "flag_point": "submit entry: > journal_and_reply", > > Does anyone know where I can find more info as to what > journal_and_reply means? Is it solely about reading and writing to > the > metadata pool, or is it waiting for OSDs to perform some action (like > ensure a file is gonem, so that it can then write to metadata pool, > perhaps)? Ohhhh, is "journal_and_reply" actually the very last event in a successful operation?...[1] No wonder so many are the last event... :facepalm: OK, well assuming that, then I can probably look out for ops which have a both a journal_and_reply event and took a large duration and see what they got stuck on... then maybe work out whatever that stuck event means. [1] https://github.com/ceph/ceph/blob/d54685879d59f2780035623e40e31115d80dabb1/src/mds/Server.cc#L1925 -c _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx