Any suggestion to deal with slow request?

Jevon Qiao <scaleqiao@xxxxxxxxx> · Thu, 7 Jan 2016 12:03:41 +0800



    Hi Cephers,

    
    We have a Ceph cluster running 0.80.9, which consists of 36 OSDs
    with 3 replicas. Recently, some OSDs keep reporting slow request and
    the cluster has a performance downgrade. 

    
    From the log of one OSD, I observe that all the slow requests are
    resulted from waiting for the replicas to complete. And the
    replication OSDs are not always some specific ones but could be any
    other two OSDs.

    2016-01-06 08:17:11.887016 7f175ef25700  0 log [WRN] :
      slow request 1.162776 seconds old, received at 2016-01-06
      08:17:11.887092: osd_op(client.13302933.0:839452
      rbd_data.c2659c728b0ddb.0000000000000024 [stat,set-alloc-hint
      object_size 16777216 write_size 16777216,write 12099584~8192]
      3.abd08522 ack+ondisk+write e4661) v4 currently waiting for subops
      from 24,31

    
    I dumped out the historic Ops of the OSD and noticed the following
    information:

    1) wait about 8 seconds for the replies from the replica OSDs.

    
                        { "time": "2016-01-06 08:17:03.879264",

                          "event": "op_applied"},

                        { "time": "2016-01-06 08:17:11.684598",

                          "event": "sub_op_applied_rec"},

                        { "time": "2016-01-06 08:17:11.687016",

                          "event": "sub_op_commit_rec"},

    
    2) spend more than 3 seconds in writeq and 2 seconds to write the
    journal.

                      { "time": "2016-01-06 08:19:16.887519",

                          "event": "commit_queued_for_journal_write"},

                        { "time": "2016-01-06 08:19:20.109339",

                          "event": "write_thread_in_journal_buffer"},

                        { "time": "2016-01-06 08:19:22.177952",

                          "event": "journaled_completion_queued"},

    
    Any ideas or suggestions?

    
    BTW, I checked the underlying network with iperf, it works fine.

    
    Thanks,

    Jevon

  
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com