I face the same problem. my osd.7 occur slow request,and many pg has a stat of active+recovery_wait. I checked network and the device of osd.7,no errors. Have you solve your problem ? 2016-01-08 13:06 GMT+08:00 Christian Balzer <chibi@xxxxxxx>: > > Hello, > > > On Fri, 8 Jan 2016 12:22:04 +0800 Jevon Qiao wrote: > >> Hi Robert, >> >> Thank you for the prompt response. >> >> The OSDs are built on XFS and the drives are Intel SSDs. Each SSD is >> parted into two partitions, one is for journal, the other is for data. >> There is no alignment issue for the partitions. >> > > As Robert said, details. All of them can be crucial. > > The missing detail here is which exact model of Intel SSDs. > > What you're describing below is not typical for Intel DC type SSDs (they > perform at full speed and are very consistent at that). > > My suspicion is that you're using consumer grade SSDs. > > >> When slow request msg is outputted, the workload is quite light on the >> replication OSDs. >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await svctm %util >> sda 0.00 0.00 0.50 30.00 0.00 0.18 >> 12.33 0.00 0.08 0.08 0.25 >> sdb 0.00 0.50 0.50 78.00 0.00 0.75 >> 19.57 0.09 1.20 0.08 0.60 >> sdc 0.00 0.50 0.00 28.00 0.00 0.24 >> 17.75 0.01 0.32 0.11 0.30 >> > > Look into atop, it gives you (with a big enough window) a very > encompassing view of what your system is doing and were bottlenecks are > likely to be. > >> I benchmarked some OSDs with 'ceph tell osd.x bench',and learned that >> the throughput for some OSDs(the disk usage is over 60%) is 21MB/s, >> which seems abnormal. >> >> $ ceph tell osd.24 bench >> { "bytes_written": 1073741824, >> "blocksize": 4194304, >> "bytes_per_sec": "22995975.000000"} >> >> But the throughput for some newly added OSDs can reach 370MB/s. I >> suspect if it is related to the GC of SSD. If so, it might explain why >> it takes such long time to write journal. Any idea? >> > There are lots of threads in this ML about which type of SSDs are suitable > for journals or not. > > Regards, > > Chibi >> Another phenomenon that the journal_write is queued in writeq for 3 >> seconds, I checked the corresponding process logic in function >> FileJournal::submit_entry() and FileJournal::write_thread_entry(), I did >> not find anything suspicious point. >> >> Thanks, >> Jevon >> On 8/1/16 00:43, Robert LeBlanc wrote: >> > -----BEGIN PGP SIGNED MESSAGE----- >> > Hash: SHA256 >> > >> > What is the file system on the OSDs? Anything interesting in >> > iostat/atop? What are the drives backing the OSDs? A few more details >> > would be helpful. >> > - ---------------- >> > Robert LeBlanc >> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> > >> > >> > On Wed, Jan 6, 2016 at 9:03 PM, Jevon Qiao wrote: >> >> Hi Cephers, >> >> >> >> We have a Ceph cluster running 0.80.9, which consists of 36 OSDs with >> >> 3 replicas. Recently, some OSDs keep reporting slow request and the >> >> cluster has a performance downgrade. >> >> >> >> From the log of one OSD, I observe that all the slow requests are >> >> resulted from waiting for the replicas to complete. And the >> >> replication OSDs are not always some specific ones but could be any >> >> other two OSDs. >> >> >> >> 2016-01-06 08:17:11.887016 7f175ef25700 0 log [WRN] : slow request >> >> 1.162776 seconds old, received at 2016-01-06 08:17:11.887092: >> >> osd_op(client.13302933.0:839452 >> >> rbd_data.c2659c728b0ddb.0000000000000024 [stat,set-alloc-hint >> >> object_size 16777216 write_size 16777216,write 12099584~8192] >> >> 3.abd08522 ack+ondisk+write e4661) v4 currently waiting for subops >> >> from 24,31 >> >> >> >> I dumped out the historic Ops of the OSD and noticed the following >> >> information: >> >> 1) wait about 8 seconds for the replies from the replica OSDs. >> >> { "time": "2016-01-06 08:17:03.879264", >> >> "event": "op_applied"}, >> >> { "time": "2016-01-06 08:17:11.684598", >> >> "event": "sub_op_applied_rec"}, >> >> { "time": "2016-01-06 08:17:11.687016", >> >> "event": "sub_op_commit_rec"}, >> >> >> >> 2) spend more than 3 seconds in writeq and 2 seconds to write the >> >> journal. { "time": "2016-01-06 08:19:16.887519", >> >> "event": "commit_queued_for_journal_write"}, >> >> { "time": "2016-01-06 08:19:20.109339", >> >> "event": "write_thread_in_journal_buffer"}, >> >> { "time": "2016-01-06 08:19:22.177952", >> >> "event": "journaled_completion_queued"}, >> >> >> >> Any ideas or suggestions? >> >> >> >> BTW, I checked the underlying network with iperf, it works fine. >> >> >> >> Thanks, >> >> Jevon > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com