On Mon, Apr 13, 2020 at 4:14 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > On Mon, Apr 13, 2020 at 1:33 PM Josh Haft <paccrap@xxxxxxxxx> wrote: > > > > Hi, > > > > I upgraded from 13.2.5 to 14.2.6 last week and am now seeing > > significantly higher latency on various MDS operations. For example, > > the 2min rate of ceph_mds_server_req_create_latency_sum / > > ceph_mds_server_req_create_latency_count for an 8hr window last Monday > > prior to the upgrade was an average of 2ms. Today, however the same > > stat shows 869ms. Other operations including open, readdir, rmdir, > > etc. are also taking significantly longer. > > > > Here's a partial example of an op from dump_ops_in_flight: > > { > > "description": "client_request(client.342513090:334359409 > > create #...)", > > "initiated_at": "2020-04-13 15:30:15.707637", > > "age": 0.19583208099999999, > > "duration": 0.19767626299999999, > > "type_data": { > > "flag_point": "submit entry: journal_and_reply", > > "reqid": "client.342513090:334359409", > > "op_type": "client_request", > > "client_info": { > > "client": "client.342513090", > > "tid": 334359409 > > }, > > "events": [ > > { > > "time": "2020-04-13 15:30:15.707637", > > "event": "initiated" > > }, > > { > > "time": "2020-04-13 15:30:15.707637", > > "event": "header_read" > > }, > > { > > "time": "2020-04-13 15:30:15.707638", > > "event": "throttled" > > }, > > { > > "time": "2020-04-13 15:30:15.707640", > > "event": "all_read" > > }, > > { > > "time": "2020-04-13 15:30:15.781935", > > "event": "dispatched" > > }, > > { > > "time": "2020-04-13 15:30:15.785086", > > "event": "acquired locks" > > }, > > { > > "time": "2020-04-13 15:30:15.785507", > > "event": "early_replied" > > }, > > { > > "time": "2020-04-13 15:30:15.785508", > > "event": "submit entry: journal_and_reply" > > } > > ] > > } > > } > > > > > > This along with every other 'create' op I've seen has a 50ms+ delay > > between all_read and dispatched events - what is happening during this > > time? > > The incoming client request messages are read on a messenger thread, > and then they get transferred into a queue for the main server event > loop. "all_read" means the message has been read off the wire (in the > messenger reader thread), and then "dispatched' means it the main > server loop thread has started work. > So a long wait in that period has nothing to do with the particular > message which is waiting, it just indicates that the server is > somewhat backed up. > Thanks for the clarification - the total op time does include the wait between those two events, though - correct? > In this case, once processing started it took about 3 ms to get locks > (which IIRC probably means it had to queue and wait for something else > to happen) and then sent back a reply and queued up the journal event > (ie, sent off the write to disk) in another half a millisecond. > > > Anyway, I don't off-hand know of any reason things should have slowed > down (really, I'd expect them to be a little faster as a bit more > stuff got parallelized). Have the number of requests being processed > by the MDS changed? Are your clients noticing the apparent new > slowness? > The client workload should be roughly the same, though the number being processed has dropped from over 6000/s to under 500/s, presumably because of the increase in time to process each. Clients are definitely noticing this increased latency. Thanks, Josh > > I'm not sure what I'm looking for the in the MDS debug logs. > > > > We have a mix of clients from 12.2.x through 14.2.8; my plan was to > > upgrade those pre-Nautilus clients this week. There is only a single > > MDS rank with 1 backup. Other functions of this cluster - RBDs and RGW > > - do not appear impacted so this looks limited to the MDS. I did not > > observe this behavior after upgrading a dev cluster last month. > > > > Has anyone seen anything similar? Thanks for any assistance! > > Josh > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx