On Mon, 4 Jan 2016, Guang Yang wrote: > Hi Cephers, > Happy New Year! I got question regards to the long PG peering.. > > Over the last several days I have been looking into the *long peering* > problem when we start a OSD / OSD host, what I observed was that the > two peering working threads were throttled (stuck) when trying to > queue new transactions (writing pg log), thus the peering process are > dramatically slow down. > > The first question came to me was, what were the transactions in the > queue? The major ones, as I saw, included: > > - The osd_map and incremental osd_map, this happens if the OSD had > been down for a while (in a large cluster), or when the cluster got > upgrade, which made the osd_map epoch the down OSD had, was far behind > the latest osd_map epoch. During the OSD booting, it would need to > persist all those osd_maps and generate lots of filestore transactions > (linear with the epoch gap). > > As the PG was not involved in most of those epochs, could we only take and persist those osd_maps which matter to the PGs on the OSD? This part should happen before the OSD sends the MOSDBoot message, before anyone knows it exists. There is a tunable threshold that controls how recent the map has to be before the OSD tries to boot. If you're seeing this in the real world, be probably just need to adjust that value way down to something small(er). sage _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com