On Thu, 8 Oct 2015, Deneau, Tom wrote: > > -----Original Message----- > > From: Sage Weil [mailto:sweil@xxxxxxxxxx] > > Sent: Wednesday, October 07, 2015 9:48 PM > > To: Deneau, Tom > > Cc: Mark Nelson; Gregory Farnum; ceph-devel@xxxxxxxxxxxxxxx > > Subject: RE: perf counters from a performance discrepancy > > > > > I finally got around to looking at the dump_historic_ops output for > > > the 1-client and 2-client cases. > > > As you recall these are all read-ops. so the events in the dump are > > > initiated > > > reached_pg > > > started > > > done > > > > > > The pattern I see for most of the slow ops recorded in the dump is: > > > > > > * In the 1-client case the typical slow op has duration between 50-65 > > ms > > > and usually most of this is the interval between reached_pg and > > started. > > > > > > * In the 2-client case the typical slow op has duration between 95- > > 120 ms > > > and again usually most of this is the interval between reached_pg > > > and started. > > > > > > Could someone describe what the interval between reached_pg and > > > started means? > > > > I think the slow part is probably find_object_context() (although to be > > fair tons of stuff happens here, see do_op()). You could test this theory > > or otherwise narrow this down with additional event markers lke > > > > diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc index > > d6f3084..6faccc2 100644 > > --- a/src/osd/ReplicatedPG.cc > > +++ b/src/osd/ReplicatedPG.cc > > @@ -1691,10 +1691,12 @@ void ReplicatedPG::do_op(OpRequestRef& op) > > return; > > } > > > > + op->mark_event("about to find"); > > int r = find_object_context( > > oid, &obc, can_create, > > m->has_flag(CEPH_OSD_FLAG_MAP_SNAP_CLONE), > > &missing_oid); > > + op->mark_event("found"); > > > > if (r == -EAGAIN) { > > // If we're not the primary of this OSD, and we have > > > > > > sage > > Sage -- > > Is it likely that find_object_context would take longer when there are two clients > each using their own pool (compared to one client using one pool)? > > And would two clients using the same pool spend less time in find_object_context? Maybe.. find_object_context is where we lookup the file, and if there are caching effects that would be one place where we'd see it. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html