On Sat, Jul 8, 2017 at 6:15 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > On Fri, Jun 30, 2017 at 3:25 PM, sheng qiu <herbert1984106@xxxxxxxxx> wrote: >> Hi, >> >> We are trying to reduce the peering processing latency, since it may >> block front io. >> >> in our experiment, we kill certain osd and bring it back after a very >> short time. We checked performance counters as below: >> >> "peering_latency": { >> >> "avgcount": 52, >> >> "sum": 52.435308773, >> >> "avglat": 1008.371323 >> >> }, >> >> "getinfo_latency": { >> >> "avgcount": 52, >> >> "sum": 3.525831625, >> >> "avglat": 67.804454 >> >> }, >> >> "getlog_latency": { >> >> "avgcount": 46, >> >> "sum": 0.255325943, >> >> "avglat": 5.550564 >> >> }, >> >> "getmissing_latency": { >> >> "avgcount": 46, >> >> "sum": 0.000877735, >> >> "avglat": 0.019081 >> >> }, >> >> "waitupthru_latency": { >> >> "avgcount": 46, >> >> "sum": 48.652836368, >> >> "avglat": 1057.670356 >> >> } >> >> as shown, average peering latency is 1008ms, most of them are consumed >> by "waitupthru_latency". By looking at the codes, i am not quite >> understand this part. Can anyone explain this part especially why it >> takes such long time in this stage? > > I think it's described in documentation somewhere, but in brief: > 1) In order to go active, an OSD must talk to *all* previous OSDs > which might have modified the PG in question. > 2) That means it has to go talk to everybody who owned it for an OSDMap interval > 3) ...except that could be pointlessly expensive if the cluster was > thrashing or something and the OSDs weren't actually running during > that epoch. > 4) So we introduce an "up_thru" mapping from OSD->epoch of the OSDMap, > which tracks when an OSD was alive. > 5) And before we can go active with a PG, we have to have committed > (to the OSDMap, via monitors) that we were up_thru during an interval > where we own it. > 5) Then, subsequent OSDs can do some comparisons between old OSDMaps > and the PG mappings to figure out if an OSD *might* have gone active > with the PG. > > So that waitupthru_latency is measuring how much time the OSD has to > spend waiting to get a sufficiently-new up_thru value to actually go > active on the PG. It's not a measure of local work. > > >> >> I also noticed there's some description regarding "fast peering" on >> http://tracker.ceph.com/projects/ceph/wiki/Osd_-_Faster_Peering >> >> is this still ongoing or stale? > > I don't think any serious work was done on it, no. Before Sam left we looked into whether it was feasible to retain peer_{info, missing} when the interval changed to try and avoid the getlog/getmissing steps. It's not clear to me this is the same work as that URL since it mentions "preemptively requesting" the log+missing which was not discussed as far as I remember. Shortly before Sam left we determined that it was only feasible to retain peer_info and peer_missing if the OSD determines it is still primary and did not go active in the last interval. Due to this limitation the work was given a much lower priority but I hope to revisit it soon anyway. We did not really discuss waitupthru_latency. > -Greg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Cheers, Brad -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html