Re: Any suggestion to deal with slow request?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I face the same problem.

my osd.7 occur slow request,and many pg has a stat of active+recovery_wait.


I checked network and the device of osd.7,no errors.


Have you solve your problem ?

2016-01-08 13:06 GMT+08:00 Christian Balzer <chibi@xxxxxxx>:
>
> Hello,
>
>
> On Fri, 8 Jan 2016 12:22:04 +0800 Jevon Qiao wrote:
>
>> Hi Robert,
>>
>> Thank you for the prompt response.
>>
>> The OSDs are built on XFS and the drives are Intel SSDs.  Each SSD is
>> parted into two partitions, one is for journal, the other is for data.
>> There is no alignment issue for the partitions.
>>
>
> As Robert said, details. All of them can be crucial.
>
> The missing detail here is which exact model of Intel SSDs.
>
> What you're describing below is not typical for Intel DC type SSDs (they
> perform at full speed and are very consistent at that).
>
> My suspicion is that you're using consumer grade SSDs.
>
>
>> When slow request msg is outputted, the workload is quite light on the
>> replication OSDs.
>>
>>     Device:         rrqm/s   wrqm/s     r/s     w/s rMB/s    wMB/s
>>     avgrq-sz avgqu-sz   await  svctm  %util
>>     sda               0.00     0.00    0.50   30.00     0.00 0.18
>>     12.33     0.00    0.08   0.08   0.25
>>     sdb               0.00     0.50    0.50   78.00     0.00 0.75
>>     19.57     0.09    1.20   0.08   0.60
>>     sdc               0.00     0.50    0.00   28.00     0.00 0.24
>>     17.75     0.01    0.32   0.11   0.30
>>
>
> Look into atop, it gives you (with a big enough window) a very
> encompassing view of what your system is doing and were bottlenecks are
> likely to be.
>
>> I benchmarked some OSDs with 'ceph tell osd.x bench',and learned that
>> the throughput for some OSDs(the disk usage is over 60%) is 21MB/s,
>> which seems abnormal.
>>
>>     $ ceph tell osd.24 bench
>>     { "bytes_written": 1073741824,
>>        "blocksize": 4194304,
>>        "bytes_per_sec": "22995975.000000"}
>>
>> But the throughput for some newly added OSDs can reach 370MB/s. I
>> suspect if it is related to the GC of SSD. If so, it might explain why
>> it takes such long time to write journal. Any idea?
>>
> There are lots of threads in this ML about which type of SSDs are suitable
> for journals or not.
>
> Regards,
>
> Chibi
>> Another phenomenon that the journal_write is queued in writeq for 3
>> seconds, I checked the corresponding process logic in function
>> FileJournal::submit_entry() and FileJournal::write_thread_entry(), I did
>> not find anything suspicious point.
>>
>> Thanks,
>> Jevon
>> On 8/1/16 00:43, Robert LeBlanc wrote:
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA256
>> >
>> > What is the file system on the OSDs? Anything interesting in
>> > iostat/atop? What are the drives backing the OSDs? A few more details
>> > would be helpful.
>> > - ----------------
>> > Robert LeBlanc
>> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>> >
>> >
>> > On Wed, Jan 6, 2016 at 9:03 PM, Jevon Qiao  wrote:
>> >> Hi Cephers,
>> >>
>> >> We have a Ceph cluster running 0.80.9, which consists of 36 OSDs with
>> >> 3 replicas. Recently, some OSDs keep reporting slow request and the
>> >> cluster has a performance downgrade.
>> >>
>> >>  From the log of one OSD, I observe that all the slow requests are
>> >> resulted from waiting for the replicas to complete. And the
>> >> replication OSDs are not always some specific ones but could be any
>> >> other two OSDs.
>> >>
>> >> 2016-01-06 08:17:11.887016 7f175ef25700  0 log [WRN] : slow request
>> >> 1.162776 seconds old, received at 2016-01-06 08:17:11.887092:
>> >> osd_op(client.13302933.0:839452
>> >> rbd_data.c2659c728b0ddb.0000000000000024 [stat,set-alloc-hint
>> >> object_size 16777216 write_size 16777216,write 12099584~8192]
>> >> 3.abd08522 ack+ondisk+write e4661) v4 currently waiting for subops
>> >> from 24,31
>> >>
>> >> I dumped out the historic Ops of the OSD and noticed the following
>> >> information:
>> >> 1) wait about 8 seconds for the replies from the replica OSDs.
>> >>                      { "time": "2016-01-06 08:17:03.879264",
>> >>                        "event": "op_applied"},
>> >>                      { "time": "2016-01-06 08:17:11.684598",
>> >>                        "event": "sub_op_applied_rec"},
>> >>                      { "time": "2016-01-06 08:17:11.687016",
>> >>                        "event": "sub_op_commit_rec"},
>> >>
>> >> 2) spend more than 3 seconds in writeq and 2 seconds to write the
>> >> journal. { "time": "2016-01-06 08:19:16.887519",
>> >>                        "event": "commit_queued_for_journal_write"},
>> >>                      { "time": "2016-01-06 08:19:20.109339",
>> >>                        "event": "write_thread_in_journal_buffer"},
>> >>                      { "time": "2016-01-06 08:19:22.177952",
>> >>                        "event": "journaled_completion_queued"},
>> >>
>> >> Any ideas or suggestions?
>> >>
>> >> BTW, I checked the underlying network with iperf, it works fine.
>> >>
>> >> Thanks,
>> >> Jevon
>
>
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux