Re: peering process

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 7 Jul 2017 13:15:26 -0700

On Fri, Jun 30, 2017 at 3:25 PM, sheng qiu <herbert1984106@xxxxxxxxx> wrote:
> Hi,
>
> We are trying to reduce the peering processing latency, since it may
> block front io.
>
> in our experiment, we kill certain osd and bring it back after a very
> short time. We checked performance counters as below:
>
>  "peering_latency": {
>
>             "avgcount": 52,
>
>             "sum": 52.435308773,
>
>             "avglat": 1008.371323
>
>         },
>
>      "getinfo_latency": {
>
>             "avgcount": 52,
>
>             "sum": 3.525831625,
>
>             "avglat": 67.804454
>
>         },
>
>         "getlog_latency": {
>
>             "avgcount": 46,
>
>             "sum": 0.255325943,
>
>             "avglat": 5.550564
>
>         },
>
>  "getmissing_latency": {
>
>             "avgcount": 46,
>
>             "sum": 0.000877735,
>
>             "avglat": 0.019081
>
>         },
>
>         "waitupthru_latency": {
>
>             "avgcount": 46,
>
>             "sum": 48.652836368,
>
>             "avglat": 1057.670356
>
>         }
>
> as shown, average peering latency is 1008ms, most of them are consumed
> by "waitupthru_latency".  By looking at the codes, i am not quite
> understand this part. Can anyone explain this part especially why it
> takes such long time in this stage?

I think it's described in documentation somewhere, but in brief:
1) In order to go active, an OSD must talk to *all* previous OSDs
which might have modified the PG in question.
2) That means it has to go talk to everybody who owned it for an OSDMap interval
3) ...except that could be pointlessly expensive if the cluster was
thrashing or something and the OSDs weren't actually running during
that epoch.
4) So we introduce an "up_thru" mapping from OSD->epoch of the OSDMap,
which tracks when an OSD was alive.
5) And before we can go active with a PG, we have to have committed
(to the OSDMap, via monitors) that we were up_thru during an interval
where we own it.
5) Then, subsequent OSDs can do some comparisons between old OSDMaps
and the PG mappings to figure out if an OSD *might* have gone active
with the PG.

So that waitupthru_latency is measuring how much time the OSD has to
spend waiting to get a sufficiently-new up_thru value to actually go
active on the PG. It's not a measure of local work.

>
> I also noticed there's some description regarding "fast peering" on
> http://tracker.ceph.com/projects/ceph/wiki/Osd_-_Faster_Peering
>
> is this still ongoing or stale?

I don't think any serious work was done on it, no.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html