Thanks for your reply. I think there are 2 scenarios: 1. Peering for non-Primary OSD down for example, PG [1, 3, 5] -> [1, 3, 6], osd 1 is always the primary OSD, and the data should be correct except data corruption, and we can read data directly from 1 before peered. I think we can check "primay of last interval == primay of current interval", if equal, the primay haven't get changed, and we run into this scenario. Calcuation last interval and current interval is much fast than peering, so we can first calc them then compare them. 2. Peering for primary OSD down if primay osd down, "primay of last interval != primay of current interval", so we need waiting for peering. -Xiaobing 发件人: ceph-devel-owner@xxxxxxxxxxxxxxx <ceph-devel-owner@xxxxxxxxxxxxxxx> 代表 Gregory Farnum <gfarnum@xxxxxxxxxx> 发送时间: 2018年3月19日 17:50 收件人: xsong682@xxxxxxxx 抄送: ceph-devel@xxxxxxxxxxxxxxx 主题: Re: Reading data before peered to improve performance On Sat, Mar 17, 2018 at 8:06 AM, xsong682@xxxxxxxx <xsong682@xxxxxxxx> wrote: > In current code, before PG peered, PG can’t do any operations including reading in following code > > void PrimaryLogPG::do_request( OpRequestRef& op, ThreadPool::TPHandle &handle) > { > > if (!is_peered()) { > // Delay unless PGBackend says it's ok > if (pgbackend->can_handle_while_inactive(op)) { > bool handled = pgbackend->handle_message(op); > assert(handled); > return; > } else { > waiting_for_peered.push_back(op); > op->mark_delayed("waiting for peered"); > return; > } > } > > that will impact performance, I think the primary OSD control the data writing, if it haven’t got changed, the data on the primary OSD is always correct except silent data error, so the data can be read safely before peered. > If it's ok, another question is how to know the OSD haven’t changed? Checking “primary of acting_set == primary of up_set” is enough or not? Should we check interval for the situation primary OSD change from “A-->B-->A”? Knowing if the OSDs changed is what peering is for, and is why we don’t allow reads on unpeered PGs. If we did allow you to read without peering, you might read the data from an old and stale OSD that was fenced off from the rest of the cluster and didn’t know it! -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f