Re: Nautilus upgrade causes spike in MDS latency

"Yan, Zheng" <ukernel@xxxxxxxxx> · Wed, 15 Apr 2020 15:47:03 +0800

On Tue, Apr 14, 2020 at 6:17 AM Josh Haft <paccrap@xxxxxxxxx> wrote:
>
> On Mon, Apr 13, 2020 at 4:14 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> >
> > On Mon, Apr 13, 2020 at 1:33 PM Josh Haft <paccrap@xxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > I upgraded from 13.2.5 to 14.2.6 last week and am now seeing
> > > significantly higher latency on various MDS operations. For example,
> > > the 2min rate of ceph_mds_server_req_create_latency_sum /
> > > ceph_mds_server_req_create_latency_count for an 8hr window last Monday
> > > prior to the upgrade was an average of 2ms. Today, however the same
> > > stat shows 869ms. Other operations including open, readdir, rmdir,
> > > etc. are also taking significantly longer.
> > >
> > > Here's a partial example of an op from dump_ops_in_flight:
> > >         {
> > >             "description": "client_request(client.342513090:334359409
> > > create #...)",
> > >             "initiated_at": "2020-04-13 15:30:15.707637",
> > >             "age": 0.19583208099999999,
> > >             "duration": 0.19767626299999999,
> > >             "type_data": {
> > >                 "flag_point": "submit entry: journal_and_reply",
> > >                 "reqid": "client.342513090:334359409",
> > >                 "op_type": "client_request",
> > >                 "client_info": {
> > >                     "client": "client.342513090",
> > >                     "tid": 334359409
> > >                 },
> > >                 "events": [
> > >                     {
> > >                         "time": "2020-04-13 15:30:15.707637",
> > >                         "event": "initiated"
> > >                     },
> > >                     {
> > >                         "time": "2020-04-13 15:30:15.707637",
> > >                         "event": "header_read"
> > >                     },
> > >                     {
> > >                         "time": "2020-04-13 15:30:15.707638",
> > >                         "event": "throttled"
> > >                     },
> > >                     {
> > >                         "time": "2020-04-13 15:30:15.707640",
> > >                         "event": "all_read"
> > >                     },
> > >                     {
> > >                         "time": "2020-04-13 15:30:15.781935",
> > >                         "event": "dispatched"
> > >                     },
> > >                     {
> > >                         "time": "2020-04-13 15:30:15.785086",
> > >                         "event": "acquired locks"
> > >                     },
> > >                     {
> > >                         "time": "2020-04-13 15:30:15.785507",
> > >                         "event": "early_replied"
> > >                     },
> > >                     {
> > >                         "time": "2020-04-13 15:30:15.785508",
> > >                         "event": "submit entry: journal_and_reply"
> > >                     }
> > >                 ]
> > >             }
> > >         }
> > >
> > >
> > > This along with every other 'create' op I've seen has a 50ms+ delay
> > > between all_read and dispatched events - what is happening during this
> > > time?
> >
> > The incoming client request messages are read on a messenger thread,
> > and then they get transferred into a queue for the main server event
> > loop. "all_read" means the message has been read off the wire (in the
> > messenger reader thread), and then "dispatched' means it the main
> > server loop thread has started work.
> > So a long wait in that period has nothing to do with the particular
> > message which is waiting, it just indicates that the server is
> > somewhat backed up.
> >
> Thanks for the clarification - the total op time does include the wait
> between those two events, though - correct?
>
> > In this case, once processing started it took about 3 ms to get locks
> > (which IIRC probably means it had to queue and wait for something else
> > to happen) and then sent back a reply and queued up the journal event
> > (ie, sent off the write to disk) in another half a millisecond.
> >
> >
> > Anyway, I don't off-hand know of any reason things should have slowed
> > down (really, I'd expect them to be a little faster as a bit more
> > stuff got parallelized). Have the number of requests being processed
> > by the MDS changed? Are your clients noticing the apparent new
> > slowness?
> >
> The client workload should be roughly the same, though the number
> being processed has dropped from over 6000/s to under 500/s,
> presumably because of the increase in time to process each. Clients
> are definitely noticing this increased latency.
>

are your ceph binaries debug version?

> Thanks,
> Josh
>
>
> > > I'm not sure what I'm looking for the in the MDS debug logs.
> > >
> > > We have a mix of clients from 12.2.x through 14.2.8; my plan was to
> > > upgrade those pre-Nautilus clients this week. There is only a single
> > > MDS rank with 1 backup. Other functions of this cluster - RBDs and RGW
> > > - do not appear impacted so this looks limited to the MDS. I did not
> > > observe this behavior after upgrading a dev cluster last month.
> > >
> > > Has anyone seen anything similar? Thanks for any assistance!
> > > Josh
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx