Hi Marcel, The peering process is the process used by Ceph OSDs, on a per placement group basis, to agree on the state of that placement on each of the involved OSDs. In your case, 2/3 of the placement group metadata that needs to be agreed upon/checked is on the nodes that did not undergo maintenance. You also need to consider that the acting primary OSD for everything is now hosted on the OSDs that did not undergo any maintenance. This all means that all 'heavy' lifting is done by these nodes until the recovery/backfilling process is completed is done by the nodes that stayed online. Also consider that Ceph will, most likely, execute peering twice per pg. Once when the OSDs start again, and once when the recovery and backfillling is finished. I really don't want to RTFM, but I don't think it is useful to copy it here: https://docs.ceph.com/en/latest/dev/peering/#description-of-the-peering-process Peering the process of bringing all of the OSDs that store a Placement Group (PG) into agreement about the state of all of the objects (and their metadata) in that PG. Note that agreeing on the state does not mean that they all have the latest contents. Kind regards, Wout 42on ________________________________________ From: Marcel Kuiper <ceph@xxxxxxxx> Sent: Friday, 6 November 2020 10:23 To: ceph-users@xxxxxxx Subject: Re: high latency after maintenance] Hi Anthony Thank you for your respons I am looking at the"OSDs highest latency of write operations" panel of the grafana dashboard found in the ceph source in ./monitoring/grafana/dashboards/osds-overview.json. It is a topk graph that uses ceph_osd_op_w_latency_sum / ceph_osd_op_w_latency_count. During normal operations we see sometime latency spikes of 4 seconds max but during the bringing back of the rack we saw a consistent increase in latency for a lot of osds into the 20 seconds range The cluster has 1139 osds total of which we had 5 x 9 - 45 in maintenance We did not throttle the backfilling proces because we succesfully did the same maintenance before on a few occasions for other racks without problems. I will throttle backfills next time we have the same sort of maintenance in the next rack Can you elaborate a bit more what happens exactly during the peering process? I understand that the osds need to catch up. I also see that the nr of scrubs increases a lot when osds are brought back online. Is that part of the peering proces? Thx, Marcel > HDDs and concern for latency don’t mix. That said, you don’t specify > what you mean by “latency�. Does that mean average client write > latency? median? P99? Something else? > > If you have a 15 node cluster and you took a third of it down for two > hours then yeah you’ll have a lot to catch up on when you come back. > Bringing the nodes back one at a time can help, to spread out the peering. > Did you throttle backfill/recovery tunables all the way down to 1? In a > way that the restarted OSDs would use the throttled values as they boot? > > > > >> On Nov 5, 2020, at 6:47 AM, Marcel Kuiper <ceph@xxxxxxxx> wrote: >> >> Hi >> >> We had a rack down for 2hours for maintenance. 5 storage nodes were >> involved. We had noout en norebalance flags set before the start of the >> maintenance >> >> When the systems were brought back online we noticed a lot of osds with >> high latency (in 20 seconds range) . Mostly osds that are not on the >> storage nodes that were down. It took about 20 minutes for things to >> settle down. >> >> We're running nautilus 14.2.11. The storage nodes run bluestore and have >> 9 >> x 8T HDD's and 3 x SSD for rocksdb. Each with 3 x 123G LV >> >> - Can anyone give a reason for these high latencies? >> - Is there a way to avoid or lower these latencies when bringing systems >> back into operation? >> >> Best Regards >> >> Marcel >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx