On Fri, Mar 14, 2014 at 9:37 AM, Greg Poirier <greg.poirier@xxxxxxxxxx> wrote: > So, on the cluster that I _expect_ to be slow, it appears that we are > waiting on journal commits. I want to make sure that I am reading this > correctly: > > "received_at": "2014-03-14 12:14:22.659170", > > { "time": "2014-03-14 12:14:22.660191", > "event": "write_thread_in_journal_buffer"}, > > At this point we have received the write and are attempting to write the > transaction to the OSD's journal, yes? > > Then: > > { "time": "2014-03-14 12:14:22.900779", > "event": "journaled_completion_queued"}, > > 240ms later we have successfully written to the journal? Correct. That seems an awfully long time for a 16K write, although I don't know how much data I have on co-located journals. (At least, I'm assuming it's in the 16K range based on the others, although I'm just now realizing that subops aren't providing that information...I've created a ticket to include that diagnostic info in future.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > I expect this particular slowness due to colocation of journal and data on > the same disk (and it's a spinning disk, not an SSD). I expect some of this > could be alleviated by migrating journals to SSDs, but I am looking to > rebuild in the near future--so am willing to hobble in the meantime. > > I am surprised that our all SSD cluster is also underperforming. I am trying > colocating the journal on the same disk with all SSDs at the moment and will > see if the performance degradation is of the same nature. > > > > On Thu, Mar 13, 2014 at 6:25 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> Right. So which is the interval that's taking all the time? Probably >> it's waiting for the journal commit, but maybe there's something else >> blocking progress. If it is the journal commit, check out how busy the >> disk is (is it just saturated?) and what its normal performance >> characteristics are (is it breaking?). >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Thu, Mar 13, 2014 at 5:48 PM, Greg Poirier <greg.poirier@xxxxxxxxxx> >> wrote: >> > Many of the sub ops look like this, with significant lag between >> > received_at >> > and commit_sent: >> > >> > { "description": "osd_op(client.6869831.0:1192491 >> > rbd_data.67b14a2ae8944a.0000000000009105 [write 507904~3686400] >> > 6.556a4db0 >> > e660)", >> > "received_at": "2014-03-13 20:42:05.811936", >> > "age": "46.088198", >> > "duration": "0.038328", >> > <snip> >> > { "time": "2014-03-13 20:42:05.850215", >> > "event": "commit_sent"}, >> > { "time": "2014-03-13 20:42:05.850264", >> > "event": "done"}]]}, >> > >> > In this case almost 39ms between received_at and commit_sent. >> > >> > A particularly egregious example of 80+ms lag between received_at and >> > commit_sent: >> > >> > { "description": "osd_op(client.6869831.0:1190526 >> > rbd_data.67b14a2ae8944a.0000000000008fac [write 3325952~868352] >> > 6.5255f5fd >> > e660)", >> > "received_at": "2014-03-13 20:41:40.227813", >> > "age": "320.017087", >> > "duration": "0.086852", >> > <snip> >> > { "time": "2014-03-13 20:41:40.314633", >> > "event": "commit_sent"}, >> > { "time": "2014-03-13 20:41:40.314665", >> > "event": "done"}]]}, >> > >> > >> > >> > On Thu, Mar 13, 2014 at 4:17 PM, Gregory Farnum <greg@xxxxxxxxxxx> >> > wrote: >> >> >> >> On Thu, Mar 13, 2014 at 3:56 PM, Greg Poirier <greg.poirier@xxxxxxxxxx> >> >> wrote: >> >> > We've been seeing this issue on all of our dumpling clusters, and I'm >> >> > wondering what might be the cause of it. >> >> > >> >> > In dump_historic_ops, the time between op_applied and >> >> > sub_op_commit_rec >> >> > or >> >> > the time between commit_sent and sub_op_applied is extremely high. >> >> > Some >> >> > of >> >> > the osd_sub_ops are as long as 100 ms. A sample dump_historic_ops is >> >> > included at the bottom. >> >> >> >> It's important to understand what each of those timestamps are >> >> reporting. >> >> >> >> op_applied: the point at which an OSD has applied an operation to its >> >> readable backing filesystem in-memory (which for xfs or ext4 will be >> >> after it's committed to the journal) >> >> sub_op_commit_rec: the point at which an OSD has gotten commits from >> >> the replica OSDs >> >> commit_sent: the point at which a replica OSD has sent a commit back >> >> to its primary >> >> sub_op_applied: the point at which a replica OSD has applied a >> >> particular operation to its backing filesystem in-memory (again, after >> >> the journal if using xfs) >> >> >> >> Reads are never served from replicas, so a long time between >> >> commit_sent and sub_op_applied should not in itself be an issue. A lag >> >> time between op_applied and sub_op_commit_rec means that the OSD is >> >> waiting on its replicas. A long time there indicates either that the >> >> replica is processing slowly, or that there's some issue in the >> >> communications stack (all the way from the raw ethernet up to the >> >> message handling in the OSD itself). >> >> So the first thing to look for are sub ops which have a lag time >> >> between the received_at and commit_sent timestamps. If none of those >> >> ever turn up, but unusually long waits for sub_op_commit_rec are still >> >> present, then it'll take more effort to correlate particular subops on >> >> replicas with the op on the primary they correspond to, and see where >> >> the time lag is coming into it. >> >> -Greg >> >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> > >> > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com